Topic modeling engine for debt collections

In the world of debt collections, having just the transcript of a call is not enough. We need to go beyond the traditional speech-to-text conversion and make sense of the mountains of unstructured data we possess. Topic Modeling is one of the ways to navigate this data dump and generate insights. In the context of machine learning and natural language processing, Topic Modeling is an approach to get actionable insights out of a collection of documents (or corpus). These “topics” are generated on the basis of the text present in the documents.

Machine learning abstract image

So how exactly does topic modeling work? Without going too deep into the statistical aspects of it, there are two main things one needs to understand about Topic Modeling.

  1. In our setting, a document refers to the call transcript of a single call. Corpus would be a bunch of these call transcripts. “Topic” as the name suggests is a repeating pattern of co-occurring words in the corpus. For example, “Stop calling”, “Do Not Call”, “Please stop calling” – would ideally represent the DNC topic.
  2. Topic Modeling works under the assumption that each document has a statistical mixture of topics. What topic modeling methods try to do is figure out how many topics are present in the documents and how strong that presence is. In simpler terms, it means that each document can be decomposed into multiple topics with a varying contribution by each topic.  

What sets Topic Modeling apart from RegEx (Regular Expression) based or dictionary-based searching techniques is that it is unsupervised. This implies there is no labeling or annotation of data. We feed data to the model without telling it anything about the kind of topics we want or the kind of documents we have. The model through multiple passes learns to identify and group similar phrases together to create a topic. It is a way to cluster and identify similar documents together by analyzing their word phrases and semantic patterns.

Topic modeling engine schematic

Topic Modeling Schematic (Source: Medium)

Topic Modeling is widely used in information retrieval from unstructured data, organizing large blocks of text and document clustering. In fact, the New York Times uses topic models to augment its user-article recommendation engine.

By using our Topic Modeling Engine, we are able to identify interesting calls at scale and zoom into them to garner actionable insights. For instance, we can group borrowers who want a settlement versus those who wish to set up a payment plan. By offering such nuanced insights, the Topic Modeling Engine ties in closely with our Speech AI. By analyzing every call, we are able to recognize patterns which humans would have missed.

At Prodigal, our mission is to take each debt to its most logical conclusion and Topic Modeling is just one of the tools in our arsenal.

This post was originally written by Prodigal’s data scientist Akshat Vaidya. Reach out to us if you’re interested in the application of machine learning and AI in debt collections.