Fuzzy Matching in Sanctions Screening: Avoiding False Positives Without Missing True Hits
How fuzzy name matching actually works, why naive substring search produces dangerous false positives, and how to tune your match thresholds.
Sanctions screening is fundamentally a fuzzy-matching problem. The names that arrive at your onboarding desk are spelled however the customer typed them — with diacritics or without, in Latin script or transliterated, with middle names reordered, with honorifics attached. The names on the SDN list are stored as the U.S. Treasury normalized them, which is a different normalization than the EU or UK uses.
The simplest approach — substring match — is dangerous. Searching for "Ali" against the OFAC list returns hundreds of designated individuals named Ali, plus every entity whose name contains the letters a-l-i (e.g. "Italy", "Salim"). Substring matching produces both massive false positive rates and, paradoxically, real misses when the queried name contains extra tokens.
Modern screening uses several techniques together: token normalization (lowercasing, accent stripping, punctuation removal), token sorting (so "John Smith" matches "Smith John"), trigram similarity (Postgres' pg_trgm extension is well-suited to this), and word-boundary constraints to avoid partial matches inside longer tokens. Phonetic algorithms like Soundex or Double Metaphone help with transliteration variants.
Match scoring matters as much as the algorithm. A score of 1.0 is an exact normalized match; scores above 0.92 are typically high-confidence; the 0.80–0.92 range needs analyst review; below 0.80 is best treated as a "possible match" requiring corroborating evidence like a date of birth or document number. Alias-only matches should be capped lower than primary-name matches, because aliases are noisier.
Finally, every screening event should be logged with the input, the matches considered, and the analyst's disposition. That audit trail is what regulators look for when they ask whether your program is risk-based and reasonable — not whether you found every theoretically possible match.