Get All Access for $5/mo

Bad Data: The $3 Trillion-Per-Year Problem That's Actually Solvable How the right tech can help entrepreneurs make data more accessible and accurate, avoiding massive losses in the process.

By Joy Youell Edited by Matt Scanlon

Opinions expressed by Entrepreneur contributors are their own.

In 2016, IBM released a report estimating that bad data costs U.S. businesses and organizations trillion $3.1 per year. These funds were wasted in, among other things, knowledge staff (such as IT) time spent in digitizing or updating older sources, finding and fixing errors while organizing, and simply hunting for both information and for confirmed sources for data they are hesitant to trust. An additional point of critical concern is the degree to which the age of Big Data has not been equally leveraged by companies; even very successful and well-established businesses have an assorted quantity of data in different places and formats, but might be powerless to use it because it is unstructured or semi-structured. If all of the possibility of artificial intelligence (AI) is to be actualized, data has to be available for use in meaningful ways.

A few obvious frontrunners, such as Google or Amazon, set precedents for data management, but most businesses are not like these, and don't deal with nearly the same volume or speed when it comes to data. For every other company in the world, including those in startup or scale-up mode, a solution is needed.

Related: 3 Data Quality Issues That Could Impact Your Judgment

Straddling digital and paper-based worlds

Many businesses of varying types and sizes straddle the digital and paper-based worlds. Some of their most potentially helpful data is in documents — file types like PDFs, images and scanned documents — practically unavailable for informing high-level decisions or arriving at decisive conclusions. If positive outcomes are going to be possible, these types of information sources need to be organized, and AI and machine learning (ML) can provide the tools to do it.

AI for managing data

For many entrepreneurs in launch mode, there are three potential barriers in place. First is the idea that AI requires immense amounts of data to result in precision activities; second, that data is often in numerous formats — structured, semi-structured and unstructured; and third, that data management might not have been an inherent part of existing business operations, and therefore course correcting would require too much effort before better results can be achieved.

Related: This Is the Secret Sauce Behind Effective AI and ML Technology

First, the quantity challenge. If businesses have less data, how can they hope to gain the same level of insight or train algorithmic models as fast as larger competitors? The solution is the same as in any sphere of software development: Do it incrementally. With one-shot learning, a model can learn from literally any data point. This tech already exists and is on display anytime a user uses facial recognition to open their smartphone, for example. The system needs very little data and can quickly learn to adapt even if small feature changes occur. Many open-source models for data don't operate this way, but could.

Second is the challenge of data in many formats. Especially in well-established industries, the digital transformation remains incomplete. This means that any and every type of historical data exists in file cabinets, on hard drives and in hard-to-access places or hard-to-match formats. This is where the power of machine learning comes in.

Data hygiene is a method of processing data to ensure it is relatively error free. There is a cycle to this, from import, to normalization, to verification to export. Depending on the nature of the data (for instance whether it is encrypted or anonymized), the method of cleaning data may vary. Machine learning can create an error-free system where objective data components are measured against one another, issues quickly identified, irrelevant parts removed and the resulting data made reliable. This can be automated and, once set up, significant amounts of information can be processed right away. Then every new data point can be processed for maximum efficiency and effectiveness.

Third is making data management part of regular business operations. The key challenge to this isn't necessarily in deploying the manpower to manage it, but in the setup. Most data-management systems use proprietary algorithms and require skilled coders or technologists to implement and maintain them. This dilemma is similarly represented in other contexts, and one in which the no-code movement is making a difference.

Innovators in this space realize that the people who know the data, and understand it intuitively, are not data scientists. Rather, they are the business owners and operators who have worked the issue and found it critical that they operate the data-management platforms. They will do the labeling and searching and actual using, so need to have self-service options — otherwise, the chaos of unorganized or inaccessible data will only be replaced by the stress of a long-term vendor contract to keep a model up and running, one that could be unsustainable.

What about data privacy?

Inevitably, any consideration of how data management is changing due to emerging technology must focus on privacy. Many of the companies seeking solutions in that space include those with encrypted data, medical data and financial data, but data in a broad range of company documents has all types of sensitive proprietary and customer information. At the most basic level, these should be protected, both at rest and in transit, ideally with more stringent levels of security, like SOC II Type 2 and HIPAA.

Related: 3 Reasons Your Company Should Prioritize Data-Privacy Compliance and Safety Issues

Machine learning can also help this effort, as it is capable of providing a layered approach. If algorithms are pre-trained using anonymized data, there should be no need to use real customer data to refine them. The aforementioned one-shot learning model takes data from an individual user and learns the document structure of that user's data, without sharing it with other users to train their models.

While entrepreneurial businesses are trying to achieve digital transformation and data accuracy, there are ways in which they can improve efficiency and reduce errors through intelligent document processing powered by machine learning. While there are many things machine learning can't yet do, data management is something well within the scope of what's possible now.

Joy Youell

Lead Content Strategist

Joy Youell is an experienced copywriter, content strategist and on-page SEO specialist. She's addicted to novelty and innovation, which has led her to considerably expand her field of study to include marketing, branding, voice development and numerous entrepreneurial endeavors.

Want to be an Entrepreneur Leadership Network contributor? Apply now to join.

Side Hustle

At 16, She Started a Side Hustle While 'Stuck at Home.' Now It's on Track to Earn Over $3.1 Million This Year.

Evangelina Petrakis, 21, was in high school when she posted on social media for fun — then realized a business opportunity.

Health & Wellness

I'm a CEO, Founder and Father of 2 — Here Are 3 Practices That Help Me Maintain My Sanity.

This is a combination of active practices that I've put together over a decade of my intense entrepreneurial journey.

Business News

Remote Work Enthusiast Kevin O'Leary Does TV Appearance Wearing Suit Jacket, Tie and Pajama Bottoms

"Shark Tank" star Kevin O'Leary looks all business—until you see the wide view.

Business News

Are Apple Smart Glasses in the Works? Apple Is Eyeing Meta's Ran-Ban Success Story, According to a New Report.

Meta has sold more than 700,000 pairs of smart glasses, with demand even ahead of supply at one point.

Money & Finance

The 'Richest' U.S. City Probably Isn't Where You Think It Is

It's not located in New York or California.

Business News

Hybrid Workers Were Put to the Test Against Fully In-Office Employees — Here's Who Came Out On Top

Productivity barely changed whether employees were in the office or not. However, hybrid workers reported better job satisfaction than in-office workers.