
> Relevant fields could be  name, street, zip, city, phone
> Is there a way to do something like this with postgresql ?
> I fear this will need still a lot of manual sorting and searching even when
> potential peers get automatically identified.

One of the techniques I use to increase the odds of detecting
duplicates is to trim each column, remove all internal whitespace,
coalesce it into a single string, and calculate an MD5 (some other
hash function may be better) hash.  It's not perfect (we are dealing
with humans, after all), but it helps.

-- Gary Chambers

/* Nothing fancy and nothing Microsoft! */

Sent via pgsql-general mailing list (
To make changes to your subscription:

Reply via email to