Industrializing and scaling up AI: the importance of aligning data platform and data science
Playtime is over. Many companies who have been experimenting with AI for years now are rightfully demanding the technology to start delivering large-scale results. But how can they industrialize and scale up their successful-yet-isolated data science experiments? Our AI experts Maarten Herthoge and Wouter Labeeuw explain the strategic and technological pitfalls of operationalizing AI – and how to avoid them.
According to a recent report from Boston Consulting Group and MIT Sloan Management Review, only 11% of companies are reaping significant returns on investment from their AI implementation. Important to note, however, is that the companies with the highest chances of success are those who continuously experiment with AI – even when initial projects don’t result in a big payoff. Why? Because their failures allowed them to adapt their business practices for better alignment with the elusive requirements of large-scale, industrialized AI.
A production-ready model
First, it’s important to understand that there is no data science – or no science at all, really – without high-quality data. This data is ingested into the data lake as a single point of entry, where it’s validated and transformed for the purpose of training machine learning algorithms using tools like Azure Databricks or Azure Machine Learning. Once a machine learning model is created and trained using a data set, it will usually become progressively less accurate, e.g. because of changing customer or process behavior. Data is constantly evolving so you have to regularly re-evaluate or even re-train the model or accept imperfections in edge cases.
Here is where many expert teams stumble in their quests to operationalize AI: they simply don’t know when their model is ‘good enough’, because they haven’t defined what that means in advance. A production-ready model solves the right problem and offers an acceptable error rate. The error rate should be calculated on unseen data from a period not included in the training or validation data. Only in this way can you verify how well the machine learning model would perform in production.
Containers and APIs
Once the hurdle of a production-ready model has been cleared, the next step in AI operationalization is to package it as a container and to deploy it as an API. Containers encapsulate the dependencies of the machine learning model so that the model can run on its own, which allows for easy integration or distribution. The model should then be exposed via an API, which can subsequently be integrated into end-user applications anywhere across the company. Another advantage of this approach is that a container can run anywhere: you can host it as a cloud based service, but also run it on the edge, close to your production.
How to defeat the ‘data zoo’
Doesn’t sound too complex, does it? The reality, however, is often less straightforward. The main culprit? The presence of a ‘data zoo’ within a lot of companies: an untethered troupe of incompatible tools, frameworks and approaches used to collect and process data.
This situation often results from a disconnect between the underlying ‘boring’ data platform and the ‘hip’ data science applications. The fact is that one simply doesn’t work without the other. It also explains why companies that have invested in setting up a future-proof data platform also tend to reap higher rewards from their AI experiments, are more successful in upscaling and industrializing those experiments and show significant faster time-to-market and agility.
The funny thing about AI
A prime example of one such company is BekaertDeslee, where past investments in company-wide standardization have enabled the quasi-seamless rollout of a waste-reduction algorithm on multiple production sites. Here, the model is linked via an API, which receives order information from the ERP to calculate the waste production potential (see the figure above).
All of this makes it clear that AI operationalization is nothing like a normal deployment. While all it takes is a laptop and some creativity to start experimenting, reaping AI’s true benefits requires a clear business case, a solid and future-proof data platform, and a long-term strategy. The latter might not be as hip as the latest data science tool, but it’s a lot more reliable.
At delaware, we have both the technical skills to build your future-proof data platform and the adventurous spirit required to embark on outside-of-the-box data science experiments.