Redaction of private data: A financial services requirement

A person (obscured) types on a laptop with voice recording waves shown on the laptop screen.

Every day, borrowers the world over interact with agencies who must follow strict compliance mandates. Every day, those agencies record thousands — even millions — of calls, to make sure they’re doing everything cleanly, fairly, and in accordance with all federal, local, and organizational guidelines.

In all of those calls, a consumer finance interaction occurs. And with consumer finance interactions come CFPB regulations. First up — verification of the right party before a debt is disclosed. Next up, let’s say a payment is going to be collected. The consumer might share their payment card information (PCI) and their personally identifiable information (PII). 

Uh-oh! If you’re an agency or any other financial services group that handles personally identifiable information, you’ve got a problem. You’re getting payment, but you’re also recording that call. Handling the data safely and securely is critical to your reputation and your bottom line. 

Read This: 5 Biggest Compliance Breaches & Associated Losses

Consequently, many agencies end up redacting the data from the recordings. Doing so accurately and efficiently remains a challenge. 

How does Prodigal think about redaction?

Reliability and accuracy are the most important qualities in a redaction partner. Prodigal has built a state-of-the-art Redaction AI model to handle PCI and PII data, offered now through Prodigal ProRedaaS (Redaction as a Service).

To understand the accuracy of our model’s performance, we evaluated it on a sample dataset. Additionally, we thought it made sense to compare to other services, like AWS Comprehend, which provides a form of output that can be easily used for benchmarking accuracy. 

How accurate is Prodigal ProRedaaS?

The big reveal: Both models showed about 98% token level accuracy, but Prodigal’s model is performing more strongly on PCI, with larger recall than Comprehend. In many cases, Comprehend redacted collections account numbers as though they represented bank numbers — not ideal for call review! 

Let’s see a test example of Prodigal’s performance. 

Redaction experiments and examples

The following example is completely artificial and has been used for demonstration purposes only.

Often, transcription services can make mistakes in transcribing a call correctly. Our model also seems robust to those noises.

Changes made to introduce noise:

social → vocal

phone number missed

card → car

cvv missed

But, as expected you will observe the confidence score for those labels reducing as important context is being removed/altered. This reduction emphasizes how important context is for a redaction model to make the correct decisions. 

SSN (0.99→0.39)

phone number (1→0.96)

card number (1→0.91)

cvv (1→0.99)

Why use redaction built on accuracy?

As you can see, context and accuracy count when it comes to compliance. Prodigal’s ProRedaaS offers: 

  • Better accuracy than AWS Comprehend for most entities (and higher bank account number precision)
  • More complete coverage on the PII set of entities, in addition to Date of Birth redaction.
  • Transcription error model robustness

Redaction counts when you’re dealing with financials. The bottom line: You need to be able to trust it. If you want to learn more about how Prodigal can help your agency stay compliant with data privacy laws and more, reach out to our team below.