Problem
Inferess, a supply chain analytics startup based in Silicon Valley, wanted to use AI to detect customer supplier relationships between companies in SEC filings. The goal is to help investors make informed decisions on a company and its suppliers, instead of deciding on a given company in isolation.
Approach
We used topic modeling, active learning, the teacher-student paradigm, and the AI-assisted label correction to automate the data labeling and improve accuracy.
Before
The client hired a data labeling company and gave them instructions on how to label the sentences. After 4 weeks, the client had a set of 4.5k labeled examples at his disposal. iSemantics team used this labeled data to BERT, an advanced AI model.
The model scored 80% accuracy.
After
Our team used automated data labeling techniques to improve data labeling accuracy.
Topic modeling, active learning, and the teacher-student paradigm reduced the number of labeled examples to only 300 instead of 4.5K while using the same underlying model; BERT.
The AI-assisted label correction improved accuracy to 91%.
Results
11% accuracy improvement, 15x faster application development to build a production-grade model.
The result was due to Data Improvement, not Model Improvement.
To learn more about automated data labeling techniques, read the full article here.