Greetings,

I have two tables that are populated using large datasets from disparate 
external systems, and I am trying to match records by customer name between 
these two tables. I do not have any authoritative key, such as customerID or 
nationalID, by which I can match them up, and I have found many cases where the 
same customer has different first names in the two datasets. A sampling of the 
differences is as follows:

Michael <=> Mike
Tom <=> Thomas
Liz <=> Elizabeth
Margaret <=> Maggie

How can I build a query in PostgreSQL (v. 9.6) that will find possible matches 
like these on nicknames? My initial guess is that I would have to either find 
or build some sort of intermediary table that contains associated names like 
those above. Sometimes though, there will be more than matching pairs, like:

Jim <=> James <=> Jimmy <=> Jimmie
Bill <=> Will <=> Willie <=> William

and so forth.

Has anyone used or developed PostgreSQL queries that will find matches like 
these? I am running all my database queries. on my local laptops (Win7 and 
macOS), so performance or uptime is no issue here. I am curious to see how 
others in this community have creatively solved this common problem.

One of the PostgreSQL dictionaries (synonym, thesaurus etc.) might work here, 
but honestly I am clueless as to how to set this up or use it in queries 
successfully.

Thanks,
Michael (aka Mike, aka Mikey)

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to