Garbage In, Gold Out! – Innovation Evangelism

We’ve all heard the expression “garbage in, garbage out” when it comes to data systems. But Generative AI brings a big caveat, and a big new opportunity.

Illustration of a machine taking in garbage and pushing out gold

Generative AI can help turn data garbage into business gold.

Data remains the biggest and most important factor in the usefulness of AI systems. Algorithms are becoming a commodity, so the biggest differentiator is the quantity, quality, and relevance of the underlying data set. And the better the data, the easier it is to create quality outputs.

But there’s an important distinction between the underlying data and the way it’s actually recorded and stored. Real-world systems see the world through a cracked and smudged lens. But even if each point of light is dubious, we can still get an overall impression of what’s going on.

The Importance of Innovation in a Post-COVID Era – Innovation Evangelism

For example, if your IoT sensors are recording random numbers, you obviously can’t get anything useful out of them. But if they’re “just” inaccurate, with the real data hidden behind a veil of noise, the result is still potentially useable with the right statistical techniques. Machine learning algorithms can capture the underlying patterns that (probably) generated the observed, messy data.

Now new Generative AI technologies are providing another huge step forward in dealing with imperfect data.

Large language models are very good at dealing with some types of messy data. For example, researchers have shown that large language models like GTP-4 can decipher even very scrambled sentences:

image showing how GPT-4 was able to unscramble test to recreate a sentence

A personal example: my daughter recorded a short section of her economics class (with permission). The quality was awful—the teacher’s voice was almost completely drowned out by the sound of my daughter typing and other background sounds. I personally couldn’t really hear what he was saying.

I ran the recording through OpenAI’s open-source transcription algorithm Whisper, using the slowest and most sophisticated model available. It did a good job of deciphering many of the spoken words, but there were gaps, a few words that were clearly incorrect, and the result was hard to follow (the teacher had a tendency to digress and circle back).

I took the transcript and put it into ChatGPT 4, asking it to “take the text and put it into sentences”. As if by magic, out popped a restructured, clear, three-paragraph summary of the economic points the teacher had discussed. It wasn’t what he said, but it was a lot closer to what he meant.

Large language models are good at figuring out what we meant, and the principle applies to many real-world data problems.

For example, machine learning is already used to extract information from documents such as invoices: the date, amount, supplier ID etc. But these models require lots of training data, and don’t generalize very well— if you try to use them against a new layout of invoice that the model hasn’t seen before, then it may get stumped. By adding generative AI, the system gets much more effective at dealing with edge cases and novel layouts.

There are dangers, because these models are designed to synthesize what “should” or “could” be there, no just analyzing what is actually there. From the previous examples, the result may be thoughts the economics teacher never mentioned, or a supplier ID even if one is not included in the document.

Figuring out how to avoid such “hallucinations” is currently the leading edge of AI research—with approaches that include asking the model to double-check itself, averaging out the results of several instances of the model, or an extra check from a dedicated verification model acting independently.

But overall, generative AI is a great new opportunity to open up more data in new ways, to rethink what data sources are available, how they can be used to improve processes—and to turn what looks like data garbage into business gold.

Tags: Business Insurance

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Garbage In, Gold Out! – Innovation Evangelism

Related Posts:

Building Wings on the Way Down: an All-Access Pass Success Story

Best small business insurance companies in New York

Tips to Build Your Digital Customer Experience Strategy in 2024

Big Bridge Sways 2024

Top 6 Brilliant Money-Saving Tips For Moms

Why Your Business Will Grow with a Serviced Office – Business Tips & Advice

Beginner finance tips for small business

How Digital Technology Is Reshaping The Startup Journey

Related Posts:

More Stories

Related Article