In the five-year span between 2016 and 2021, the average amount of data that organizations managed grew by 10 times, from 1.45PB in 2016 to 14.6PB in 2021.
We are extremely adept at generating data, not so much at extracting value from those data, and very challenged to destroy any data at all. Data hoarding, data sprawl, and data decay are all significant problems for contemporary companies, and these issues can create legal liability risk and operational inefficiencies. Yet data minimization efforts tend to be difficult from the start, mainly due to the fear of deleting something that may be of value at some point in the future.
The Data Problem
An academic doctoral study released in 2020 included several statistics highlighting the fact that data generation has increased exponentially in recent years, with no signs of this trend stopping. Among those metrics was the estimate that globally we are generating around 2.5 exabytes of new data per day, as well as the prediction by the U.S. Government Accountability Office that by 2025 there will be between 25 and 50 billion devices connected to the internet and actively generating data. That same study reported that organizations effectively use less than 5% of their available data. There are three potential problems that could cause this situation: Companies don’t know how to analyze the data they have, they don’t know what insights they could gain by analyzing the data, or they simply don’t know that they have the data in the first place.
One study found that nearly 85% of Fortune 500 organizations are unable to use their data effectively. Yet companies continue to store data in the hope that one day they might be able to analyze it appropriately and somehow extract insights from the “gold mine” of hoarded data they have accumulated. In thinking this way, they disregard the fact that most data have a shelf life that will reduce its viability before the company is able to extract valuable information from it.
When data is not properly curated or updated, it becomes outdated, inconsistent, and potentially unreliable. Data hoarding exacerbates each of these issues — resulting in lower data quality and accuracy — as it becomes exponentially more difficult to maintain, clean, and manage data as it constantly grows within the enterprise over time. Given the recent regulatory focus on data privacy, much of this data are not only pure tech debt but also increase liability for the company.
Analyzing rogue (incomplete, inaccurate, irrelevant, corrupt, incorrectly formatted, or duplicative) data is so problematic that the data science community says that a “rule of 10” applies to it. The rule states that it will cost 10 times more for a data scientist to complete a unit of work when the data is unclean compared with when the data is perfect.
Why Do We Need to Be Concerned About Data Minimization?
An IBM survey found that poor data quality costs the U.S. economy approximately $3.1 trillion annually and that companies are losing up to 12% of their potential revenue due to rogue data within their business processes.
Storing unnecessary data can expose an organization to security and compliance risks and lead to compliance violations, especially under regulations like GDPR or CCPA. This is why these privacy laws have requirements for data minimization. According to the Colorado Privacy Act, the processing of personal data “shall be solely to the extent that the processing is necessary, reasonable, and proportionate to the specific purpose or purposes.” Similar verbiage exists in all other privacy laws. Regardless of where you do business, it is highly likely that at least one, if not many, of these laws apply to your business.
What Is Data Minimization and How Is It Accomplished?
At its foundation, data minimization means adherence to two basic principles: Only collect the data that you actually need to provide your services, and don’t keep data any longer than you need.
To minimize your data, you should do the following:
- Evaluate your data storage processes and align data retention policies and practices with the principles outlined in the various privacy laws that your company is subject to.
- Implement data destruction policies and follow them.
- Use data classification and data discovery tools to scour your current data sets.
- Remove data that hasn’t been accessed for years and that you are storing for no valid business reason.
Adhering to data minimization principles will not only help you remain in compliance with privacy laws, but it will also reduce your attack surface; improve your operational ability to analyze the information you do have; and improve your ability to make data-driven decisions based on current, clean, minimal data.