Get All Access for $5/mo

Bad Data: The $3 Trillion-Per-Year Problem That's Actually Solvable How the right tech can help entrepreneurs make data more accessible and accurate, avoiding massive losses in the process.

By Joy Youell Edited by Matt Scanlon

Opinions expressed by Entrepreneur contributors are their own.

In 2016, IBM released a report estimating that bad data costs U.S. businesses and organizations trillion $3.1 per year. These funds were wasted in, among other things, knowledge staff (such as IT) time spent in digitizing or updating older sources, finding and fixing errors while organizing, and simply hunting for both information and for confirmed sources for data they are hesitant to trust. An additional point of critical concern is the degree to which the age of Big Data has not been equally leveraged by companies; even very successful and well-established businesses have an assorted quantity of data in different places and formats, but might be powerless to use it because it is unstructured or semi-structured. If all of the possibility of artificial intelligence (AI) is to be actualized, data has to be available for use in meaningful ways.

A few obvious frontrunners, such as Google or Amazon, set precedents for data management, but most businesses are not like these, and don't deal with nearly the same volume or speed when it comes to data. For every other company in the world, including those in startup or scale-up mode, a solution is needed.

Related: 3 Data Quality Issues That Could Impact Your Judgment

Straddling digital and paper-based worlds

Many businesses of varying types and sizes straddle the digital and paper-based worlds. Some of their most potentially helpful data is in documents — file types like PDFs, images and scanned documents — practically unavailable for informing high-level decisions or arriving at decisive conclusions. If positive outcomes are going to be possible, these types of information sources need to be organized, and AI and machine learning (ML) can provide the tools to do it.

AI for managing data

For many entrepreneurs in launch mode, there are three potential barriers in place. First is the idea that AI requires immense amounts of data to result in precision activities; second, that data is often in numerous formats — structured, semi-structured and unstructured; and third, that data management might not have been an inherent part of existing business operations, and therefore course correcting would require too much effort before better results can be achieved.

Related: This Is the Secret Sauce Behind Effective AI and ML Technology

First, the quantity challenge. If businesses have less data, how can they hope to gain the same level of insight or train algorithmic models as fast as larger competitors? The solution is the same as in any sphere of software development: Do it incrementally. With one-shot learning, a model can learn from literally any data point. This tech already exists and is on display anytime a user uses facial recognition to open their smartphone, for example. The system needs very little data and can quickly learn to adapt even if small feature changes occur. Many open-source models for data don't operate this way, but could.

Second is the challenge of data in many formats. Especially in well-established industries, the digital transformation remains incomplete. This means that any and every type of historical data exists in file cabinets, on hard drives and in hard-to-access places or hard-to-match formats. This is where the power of machine learning comes in.

Data hygiene is a method of processing data to ensure it is relatively error free. There is a cycle to this, from import, to normalization, to verification to export. Depending on the nature of the data (for instance whether it is encrypted or anonymized), the method of cleaning data may vary. Machine learning can create an error-free system where objective data components are measured against one another, issues quickly identified, irrelevant parts removed and the resulting data made reliable. This can be automated and, once set up, significant amounts of information can be processed right away. Then every new data point can be processed for maximum efficiency and effectiveness.

Third is making data management part of regular business operations. The key challenge to this isn't necessarily in deploying the manpower to manage it, but in the setup. Most data-management systems use proprietary algorithms and require skilled coders or technologists to implement and maintain them. This dilemma is similarly represented in other contexts, and one in which the no-code movement is making a difference.

Innovators in this space realize that the people who know the data, and understand it intuitively, are not data scientists. Rather, they are the business owners and operators who have worked the issue and found it critical that they operate the data-management platforms. They will do the labeling and searching and actual using, so need to have self-service options — otherwise, the chaos of unorganized or inaccessible data will only be replaced by the stress of a long-term vendor contract to keep a model up and running, one that could be unsustainable.

What about data privacy?

Inevitably, any consideration of how data management is changing due to emerging technology must focus on privacy. Many of the companies seeking solutions in that space include those with encrypted data, medical data and financial data, but data in a broad range of company documents has all types of sensitive proprietary and customer information. At the most basic level, these should be protected, both at rest and in transit, ideally with more stringent levels of security, like SOC II Type 2 and HIPAA.

Related: 3 Reasons Your Company Should Prioritize Data-Privacy Compliance and Safety Issues

Machine learning can also help this effort, as it is capable of providing a layered approach. If algorithms are pre-trained using anonymized data, there should be no need to use real customer data to refine them. The aforementioned one-shot learning model takes data from an individual user and learns the document structure of that user's data, without sharing it with other users to train their models.

While entrepreneurial businesses are trying to achieve digital transformation and data accuracy, there are ways in which they can improve efficiency and reduce errors through intelligent document processing powered by machine learning. While there are many things machine learning can't yet do, data management is something well within the scope of what's possible now.

Joy Youell

Lead Content Strategist

Joy Youell is an experienced copywriter, content strategist and on-page SEO specialist. She's addicted to novelty and innovation, which has led her to considerably expand her field of study to include marketing, branding, voice development and numerous entrepreneurial endeavors.

Want to be an Entrepreneur Leadership Network contributor? Apply now to join.

Editor's Pick

Starting a Business

He Started a Business That Surpassed $100 Million in Under 3 Years: 'Consistent Revenue Right Out of the Gate'

Ryan Close, founder and CEO of Bartesian, had run a few small businesses on the side — but none of them excited him as much as the idea for a home cocktail machine.

Business News

Looking for a Remote Job? Here Are the Most In-Demand Skills to Have on Your Resume, According to Employers.

Employers are looking for interpersonal skills like teamwork as well as specific coding skills.

Business News

Meta Fires Employee Making $400,000 Per Year Over a $25 Meal Voucher Issue

Other staff members were fired for the same reason, per a new report.

Business News

'Jaw-Dropping Performance in 2024,' Says a Senior Analyst as Nvidia Reports Earnings

Nvidia reported its highly-anticipated third-quarter earnings on Wednesday.

Business Ideas

63 Small Business Ideas to Start in 2024

We put together a list of the best, most profitable small business ideas for entrepreneurs to pursue in 2024.