Back to overview
05. March 2021

5 Risks of Retaining Too Much Dark Data

Despite being a relatively new term in the data world, the risks associated with it are already well understood. We’ll look at what makes data dark, the risks it poses, and how you can mitigate those risks.

Data continues to grow at an ever-accelerating pace. The IDC estimates that from 2018 to 2025, the amount of data in the world will more than quintuple! However, this explosive data growth is not without its risks. One type of data in particular stands out as having the potential to cause serious harm: dark data.

Despite being a relatively new term in the data world, the risks associated with it are already well understood. We’ll look at what makes data dark, the risks it poses, and how you can mitigate those risks.

What Is Dark Data?

Much like how scientists have determined that most of the matter in the universe is dark matter, data analysts have dubbed the data we do not see or understand as “dark data.” This is any data that your company is not actively using or monitoring. It may have value, but because it goes unnoticed, it presents nothing but risk until you get control over it.

Dark data takes many forms, from working drafts and assets that don’t make the final cut to files from old employees or previous clients that are no longer relevant to your business. Programs generate tons of dark data when they make background saves and use system caches to store temporary files. While it is good to have backups of useful data, many backups are never assessed, cleaned or organized, becoming data in the dark.

So, why is it so dangerous to leave unknown data lying around in obscurity?

5 Risks of Retaining Dark Data

1. Failing to Be in Compliance

This year, data privacy issues have become a major concern for organizations and C-suite executives. Since the GDPR (the EU’s General Data Protection Regulation) was passed in 2018, other countries have followed suit with similar legislation. Without a comprehensive federal data privacy law like GDPR, individual U.S. states have passed laws modeled on the GDPR, including California’s CCPA. These laws establish stringent rules for data privacy management and consumer rights.

For instance, under the GDPR and similar laws, consumers have the right to have their data deleted upon request. They can also request reports to know exactly what data you have on them. If data is hidden in the dark, how will you be able to accurately tell consumers what data you have? And how would you delete what you don’t know about?

These situations can lead to costly fines and a serious lack of consumer trust. Let’s just look at the two biggest laws on the books.


The GDPR introduces the concept of data portability, right of access, and the right to erasure. This means consumers can submit a DSAR (data subject access rights) request to find out what data your company has collected on them, request that you deliver it to them, modify the information, or ask that it be removed. A [Deloitte survey]( showed that 3 out of 4 respondents were aware of these rights, and 9 to 12% have exercised them already.

Failure to comply with these requests can lead to serious fines. If you have data in the dark and consumers find out later, then your company can be in serious trouble with GDPR’s Data Protection Authorities. It’s no wonder nearly three-fourths of companies have hired a data protection officer (DPO) since the GDPR’s passage.

California’s CCPA

Similar to the GDPR, the California Consumer Protection Act grants similar rights to anyone whose data was gathered in California. Even a tourist on a trip has their rights covered if you collect data on them while they are in California. Not only can your company be fined for noncompliance by the State’s Attorney General, but individuals are entitled to a private right of action when certain data privacy rights are violated.

To learn more about the GDPR and CCPA, check out our data privacy e-book.

2. Potential for Data Breaches

Another massive risk of dark data stems from data breaches. The frequency of data breaches has only continued to increase, and due to the growth of data, the potential for damage is higher than ever. Regarding dark data, 95% of companies found over 100,000 folders of unused data that were not being monitored.

Data breaches can cripple a company and cost hundreds of millions of dollars. Just look at Yahoo’s massive settlements or the damage done to Sony during its Playstation Network hack. Both of these were caused by loose files that the company was unaware of. This dark data led to a slow response in detection and communication, which blemished their reputation.

3. Chaos and Confusion

![How to efficiently find your data](
Just because data is considered dark doesn’t mean it can’t be seen at all. Even though your company might not be utilizing it, your employees may still have to sift through it regularly to find what they need. Dark data can have useful assets that are hidden and can’t be leveraged to their full potential. However, it is buried among ROT (redundant, obsolete, or trivial) data, which slows down productivity and application performance and costs your company money as employees struggle to get to the files they actually need to do their work.

4. Unexpected Liability

If your company has a data retention policy, you can protect yourself from lawsuits. For instance, if your policy claims to retain data for 5 years, the discovery phase of a lawsuit will stop there, as you’ve reasonably justified that older data is not present.

However, suppose lawyers stumble upon dark data that dates back further than your policy should allow. Now they can justify to the court that your policy is meaningless and subpoena additional files, increasing the chance of finding something that might be damaging to your company’s reputation. You may wind up losing a case that might otherwise have been thrown out.

5. Missed Opportunities

Finally, there’s the risk of falling behind the competition. Companies who exploit the maximum potential of their data will beat out those who don’t. Hidden data isn’t necessarily useless. Much like an untold amount of oil reserves, you need to probe and inspect to determine how much can actually be extracted and turned into profitable action.

Shine Light on Your Dark Data

With Aparavi’s intelligent data management platform, you can automate the classification of all of your files, shining a light on your dark data. Our platform parses data and tags it to make dark data discovery easy. You can now illuminate those dark corners of your company and put your data to use. Contact Aparavi today to find out more about how we can take your data out of the dark.