Building Credit Karma’s Unclaimed Money Feature

May 09, 2017

At Credit Karma, we’re constantly looking for new ways to provide timely, personalized financial assistance to our members: we help them correct credit report inaccuracies, find the best interest rate on a loan, and even help them file their taxes for free. This vision to become the best personal financial assistant drove us to build our latest feature—a tool to help people find their unclaimed money.

What’s unclaimed money, anyway?

Unclaimed money sounds too good to be true, and it’s part of the reason there’s so much sitting around. When a business owes you money but is unable to contact you, they may be required by law to transfer that money to state governments. There’s over $40 billion of unclaimed money sitting with state governments in the United States!

With over 60 million members, Credit Karma has the unique ability to help get this money back to its original owners—to tap them on the shoulder and point them to their long lost dollars. In 2016, Credit Karma acquired Claimdog—a company that Manu Lakkur and I founded together—that searched unclaimed money databases in different states. Since then, we’ve been hard at work building the product Unclaimed Money and making it a part of the suite of Credit Karma features. We’re excited to offer this feature, and better fulfill our role as a financial assistant.

The technical challenge

With a broad vision of matching people to their unclaimed money, we needed to build the software to support it. In particular, we needed to answer the following question: how could we reliably determine when one of our members has unclaimed money?

The key lies in our member data. In the states where we’ve obtained a list of unclaimed money data, we compare that list to member’s data. When we find what we think is a match, we let our members know.

If we think we found a match, money may be owed to that member. When we took a preliminary look at the data, we found as much as $75 million in potential matches for over half a million members in California alone!

Loading the data

Given a positive signal that many of our members were owed money, we set about building the backend. Of course, Credit Karma already has a rich dataset of data for our members. But we also needed to get data from state unclaimed money databases in order to perform the match. States typically provide public data through their online portals, allowing users to look up their name and file a claim if they find money. We’ve worked with a growing list of states to get full data dumps so we can quickly perform the match. Rather than wait for states to offer more convenient APIs, we wrote our own scripts to transform whatever they gave us into a standard format that could be reliably surfaced to our users. The formats and frequency of state data updates vary: many states send us their data on physical DVDs and CDs (one state even sent us a CD with the data contained in a PDF).

Building the claim microservice

As Credit Karma continues to scale, the engineering team has moved towards a service-oriented architecture, away from a monolithic one in which logic resides in a single application. The Unclaimed Money feature was a perfect opportunity for us to build a new microservice. The service we built handles any interactions with unclaimed money data: retrieving individual claims, searching for claims, and computing which claims are owed to which members.

Building the claim backend as a microservice had several large advantages. First, we weren’t constrained by the deploy cycle of the core monolithic application; our team owned our own continuous integration pipeline and deploy cycles. This made it easy to test new changes to our claim-matching and search algorithms. Second, we could leverage the Finagle framework, a powerful, high-concurrency RPC framework that we have adopted here at Credit Karma. This helped us build a highly performant application from the get-go.

Making the match

Claim Service is backed by Elasticsearch, which houses the unclaimed money records we receive from government databases. Elasticsearch is an increasingly popular datastore most commonly used for full-text document search, but it can work quite well for human name search, too.

Whenever we might have fresh address data for a user, Claim Service triggers a bulk Elasticsearch query for each address and name pairing. We cache these matches so they can be quickly retrieved by any Claim Service clients.

Improving the match

Data is often messy, and unclaimed money data is no exception (in fact, the messiness is often the reason the money stays unclaimed in the first place). Often times, street names or cities will be misspelled, information will be missing, or street formats will be inconsistent.

To account for this, we added some simple matching features. We canonicalize addresses with some simple rules (standardizing “street” as “st”), and fuzziness. When comparing credit report data with unclaimed money data, we calculate a simple Levenshtein distance, a measure of the similarity between two strings, to account for any small typos in the data. If two strings have a relatively low Levenshtein distance, we still count them as a match.

For example, suppose the credit report data for a user is the following:

{  
“names”: ["Dan Johnson"],  
“addresses”: ["432 Jamestown Drive, Houston, TX", "1234 Jefferson St, San Francisco, CA"]
}

We could then match this to an unclaimed money record from the state’s database, despite a few minor differences in the data:

{
  “propertyId”: “123457”,
  “owners”: ["Dan F Johnson"],
  “address”: {
    “lineOne”: “1234 Jefferson Street”,
    “city”: “San Francsco”,
    “state”: “CA”,
    “zip”: “94182”
  },
  “amount”: 234.96
}

How do we set the right fuzziness level for our match? Aside from internal automated tests, we can also monitor user engagement as we change the matching algorithm. Suppose we make a change to the match logic to increase the number of matches we’ve found for our members. If they’re still clicking through on these matches to claim their money, we know our algorithms are getting people to the right claims. If the click-through rate declines, we may be showing poor matches and can then adjust accordingly.

Adding search

Sometimes we’re not able to make a match for a user: we simply don’t have all of the past names and addresses for a user, or the government data is so sparse that it doesn’t match our records. We’ve layered on a simple end-user search feature, allowing users to type in their names and sift through a set of results manually. This also lets users find money for their friends and family. Our search feature is currently available in seven states, and we’ll be adding more as we obtain more states’ unclaimed money data.

Scaling up

Building Credit Karma’s Unclaimed Money feature was an exciting opportunity to scale out a useful feature to tens of millions of people. It was also an opportunity to build a robust backend system that could scale to additional states and new sources of unclaimed money. The architecture of Claim Service allowed us to move fast and iterate on our feature set. We’re looking forward to building out more as we see members dig into the feature. If you’re interested in working with us on any of these systems, check out our openings at creditkarma.com/careers.

About the Author

Devin Finzer

Devin is a Software Engineer working on member engagement at Credit Karma, and is co-founder of Claimdog, the company behind our Unclaimed Money feature.