Collections

Call redaction, a silent foundation of the consumer finance industry

Background

As debts travels from origination to default to collections and then maturity or charge-off, it collects a massive amount of baggage it drags along with it, mostly in the form of audio recordings and transcripts of borrower interactions.

Packed inside that baggage is a great deal of Personal Identifiable Information (PII).

Over the course of the life of the debt, it may be owned or serviced by several companies. And every one of those companies inherits the risks and responsibilities of all that PII.

That makes PII redaction critically important in ensuring a seamless flow of information across the debt lifecycle while protecting customer information and staying compliant.

Redaction in action

Redaction might not be the most exciting topic, but its importance means we need to recognize its role as a silent foundation of our entire industry.

Applications for redaction cut across all fronts in debt collection, ranging from sharing recordings of past debtor conversations with contact centers to masking data from 3rd party software being used for internal productivity. 

Examples of language that should trigger redaction:
  • “Card number”
  • “CVV code”
  • “Code on the back/front”
  • “three/four-digit code”
  • “Go ahead with that number”
  • “Give me that number”
  • “Bank account number”
  • “Routing number”
  • “Expiration date”
  • “Repeat that card number”
  • “Sixteen digits”
  • “last four of your SSN”
  • “last four of your social”
Numerical data includes:
  • Spoken digits (e.g. “one”, “two”, “nineteen”, “twenty”, “thirty-one”, etc.)
  • Dates

The specific data types targeted for redaction depend on the level of redaction the customer chooses. Sensitive data redacted from call transcripts are replaced with a special token (typically “*”).  Redacted audio in call recordings is replaced with a soft beep.

When redaction is enabled, data is redacted from transcripts and audio in a fully automated manner and is permanently and irreversibly destroyed; no un-redacted data is stored anywhere in the Prodigal system.

What are the redaction levels available?

Prodigal offers 3 different levels of redaction:

PCI Data Redaction (Level 1)

The system removes information such as debit and credit cards (15 or 16 digits), expiration dates, CVV codes, and PIN numbers, in accordance with the Payment Card Industry Data Security Standard (PCI DSS).

PCI and PII Data Redaction (Level 2)

This level allows customers to choose from a list of supported entities. Most often, we see our customers opt to redact PCI in addition to Social Security numbers and addresses.

Numeric Data Redaction (Level 3)

For extreme safety, Prodigal offers a Level 3 redaction option - wherein all numeric data is redacted. In addition to the data redacted at Level 2, this may include information such as balance due, payment amounts, settlement offers, phone numbers, specific dates that are mentioned in the conversation. With this information redacted the audio and call transcripts are free of sensitive data and can be viewed or heard freely without a security risk.

Choosing this option might affect some workflows but ensures the highest level of security.

How does redaction work?

Prodigal’s machine learning team explored some of the most popular approaches used across industries to identify the right blend for the consumer finance industry. 

The Prodigal redaction process searches call transcripts for topics, phrases, and key words (collectively called Redaction Indicator Language) that indicate the likely presence of sensitive data nearby. When Redaction Indicator Language is found, nearby “numerical” words and phrases are redacted by being removed from both transcripts and audio.

Redaction is therefore targeted and proximity-based. A targeted, proximity-based approach to redaction allows potentially valuable data -- for analytical or call review purposes -- to remain intact in transcripts and audio.

Upon customer request, the redaction algorithms can be relaxed or tightened, leading to less or more aggressive data redaction respectively.

Removal of audio data is achieved by noting the time ranges at which redacted words occur (as time-stamped in the transcript), and writing segments of silence to the audio file for those time ranges.


Because Prodigal’s redaction uses a targeted, proximity-based approach to finding numerical data near Redaction Indicator Language, it is possible for “over-redaction” (redaction applied to non-sensitive data) or “under-redaction” (redaction not applied to sensitive data) to occur.


For example, if a credit card number is spoken outside of any typical “payment” context, it may not be detected and redacted in Level 1.  Such occurrences are extremely rare but theoretically possible in Level 1 redaction. They are even rarer in Level 2 and impossible in Level 3. 

Likewise, if numerical data occurs near Redaction Indicator Language, it may be redacted even though the data is not sensitive.

In general, however, Prodigal’s redaction can be appropriately “tuned” to suit customer needs and balance the security requirements (redact more aggressively) with usability of transcripts and audio for analytics (redact less aggressively).

"We found Prodigal while looking for solutions to reinforce our call recording safeguards and further protect our customers’ personal information.
We looked at multiple vendors but ultimately trusted Prodigal and their industry-trained AI models to get the job done. They were great to work with and further customized their outputs to meet our specific needs. We’d recommend them to any team looking to effectively protect their consumer data and strengthen compliance." -VP of InfoSec, Policygenius