Knowledge mining: unlocking the value of unstructured information
Every day in workplaces around the globe, people produce millions of Word documents, PDFs, e-mails, Yammer posts, audio messages, etc. Hidden within these ‘unstructured’ piles of data lie crucial insights that can significantly improve everyday operations or lead to ground-breaking innovations. But how can companies unlock this hidden potential in a time- and cost-effective way? Enter ‘knowledge mining’ – one of the most promising domains in the field of AI.
“Organizations have always been very adept at generating information,” says Frank Devliegher, lead expert digital workplace at delaware. “The difference in the digital era is that the chaos is often invisible. In the past, when you walked into an accounting department and there were files scattered everywhere, you knew something was wrong and that you had to take action. Now, the mess exists on an abstract level that’s hidden behind the clean desks.
“In addition, digital information is coming to us via a plethora of channels: e-mail, private messages, Word docs, shared drives, Yammer posts, chat… That makes it really hard to get a comprehensive view of what piece of information can be found where, and thus makes it impossible to reach a common understanding or reveal genuine insight.”
The high cost of value generation
According to Frank, today’s digital reality and the absence of visual cues makes it even more important for organizations to keep “hidden” chaos at bay and extract value from unstructured information. “Up until recently, however, this ‘value extraction’ was a costly and time-intensive task. There was a lot of manual work involved: back-office data entry, scanning documents, archiving them, assigning the right labels and governance policies… Office workers spent hours upon hours rummaging through and classifying documents – all with the prospect of possibly gaining value from that data at some later date.
“Advanced AI technology like natural language processing (NLP) makes labeling pieces of information a lot less cost and resource intensive. The ultimate vision is to create software that understands content the way humans do, or ‘natural language understanding’ (NLU). Major recent breakthroughs like OpenAI’s famous GPT-3 language generator, no-code tools like Microsoft Cortex and Syntex and Azure Machine Learning with Python libraries such as Spacy, Pycaret, and more bring advanced NLP within reach for everyone. These tools are becoming increasingly easy to use, lowering the threshold for business users and software developers to benefit from them as well. This opens the door to a host of applications that can make employees’ lives easier, from scanning contracts for possible legal problems, to automatically assigning service tickets to the right person.”
Boosting the ‘find’ experience with tech
NLP also paves the way to solving one of the biggest pain points for today’s organizations: rendering the huge pile of unstructured information they are sitting on easily searchable for employees. “In our private lives, Google has significantly raised our expectations when it comes to searching for information. We expect search engines to more or less accurately interpret our intentions and give us highly relevant results quickly. In many organizations, however, the search experience is underwhelming.”
So how can we optimize the accessibility of knowledge within the organization? “Today, we are seeing more and more organizations adopt semantic search: the ability of machines to understand information in context and related to people. This results in what you can call ‘guided search on steroids’, where my query is supported through keywords and even graphs to guide me to the best-possible answer.”
Azure Cognitive Search
The impact of this can be quite significant, from minimal replication of documents, to saving time searching for the right piece of information or finding the right person within the organization to help you with a specific topic.
Wouter Labeeuw, data and AI expert at delaware, is working on a key knowledge-mining project to optimize research at an international institution. “Our main goal was to make it easier for people to find out what files and information already existed within the organization,” he explains. “To do this, we fed the data to Microsoft Cognitive Search and trained a semantic search model. Azure Cognitive Search extracts huge amounts of unstructured data and automatically interprets text and images. The semantic model ensures context-dependent search. This, in turn, translates into decreased replication of documents and significant time savings.”