3 types of data in AI and why data drives intelligent automation

Here’s a familiar refrain: “data is the new gold/oil/[insert valuable thing here]”. It’s true – but properly collecting, processing, storing and applying data is the only way to get any value from it. Even more, the data that you use to add intelligence to your operational process can’t be “any old data”… it has to be relevant.

So, your company is sitting on an ocean of operations-related data. That could be a great first step towards meaningful insights! But when you collected it, did you know up front how you intended to use it?

Data without relevance is fools’ gold

The key to relevant data: it’s meaningful, objective, aligned with your goals and can be used to solve specific problems. When collecting operational data with the goal of applying intelligent models to a very specific process, consider the labels you are using to tag phrases, qualities, characteristics and images.

“Subjective labels like “good”, “better” and “best” might be obvious to your human quality inspectors, but they make no sense to an AI algorithm because they have no idea what characteristics makes a product “good” in the first place,” explains Wouter Labeeuw, data scientist at delaware.ai.

“We’ve kicked off projects and only learned later during the model testing phase that metrics were subjective – which meant relabeling data from square one with more objective tags."

The 3 types of data

In operations and machine learning in general, three types of data are used to train machine-learning models, each of which corresponds with a specific source technology.

Visual data

Captured by cameras, visual data is made up of images that are tagged according to what they contain (people, vehicles, characters, defects, colors, quality, etc.). Computer vision is the corresponding AI technology for visual data.

Textual data

Gathered via camera, scanners or digital documents, textual data is organized into linguistically relevant characters, words, sentences and concepts. Natural language processing is its corresponding AI technology.

Numerical data

This type of data is neither visual nor organized into linguistic elements and is made up of figures and measurements gathered by machines, sensors or people. Driver analysis is the technology that delaware.ai applies to determine how these figures influence each other in specific contexts.

A database is good, but a data platform is better

We encourage our customers who are working with large amounts of data to invest in a data platform – a cloud database managed by a single, central data governance framework. This single framework simplifies the process of transforming data into the form needed by a machine-learning model that is trained to solve a specific problem.

Nowadays, cloud-based data platforms have the power, cost effectiveness, reliability and security to handle almost any corporate or industrial machine-learning project. Without one, your data team will have to rely on good old elbow grease – manual labor – to clean, update and transform the data you collect.

Curious about what the right data is capable of? Discover 5 challenges in operations that machine-learning can solve and the technologies that drive them in our exclusive e-book: ‘AI & operations: adding intelligence to your operational processes’.

Need some help identifying the best applications of Industry 4.0 technologies in your company? Get in touch with one of our experts.

Our expert

Wouter Labeeuw

Wouter Labeeuw

Wouter Labeeuw works as data science and machine learning consultant, adding intelligence to applications. He is a computer scientist with a PhD in engineering, where the research was focused on applying machine learning in the context of electrical demand response. In January 2016, he started at Delaware, still focusing on machine learning but in a broader context. Within his current role, he is responsible for the data science team within delaware.

Contact us