Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-03 Thread Michael Nolan
Have you considered using a soundex function to sort names into similarity groups? In my experience it works fairly well with Western European names, not quite as well with names from other parts of the world. It also doesn't deal well with many nicknames (Mike instead of Michael, etc.) -- Mike

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-03 Thread mongoose
Hi Mike, I was planning to do something like David suggested: I would sort the rows based on name and then I would use a window (i.e. 100 rows) to compare each individual name to the previous 100. All I want to do is create groups of similar rows based on some criteria. -- View this message i

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-03 Thread mongoose
Thanks for the help. I will give your code a try. Btw I know how to solve this in a different language but unfortunately I am a very rookie with databases. -- View this message in context: http://postgresql.nabble.com/Merge-rows-based-on-Levenshtein-distance-tp5828841p5829145.html Sent from th

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-03 Thread Michael Nolan
I don't think you've defined your problem very clearly. Suppose you have 1000 names in your database. Are you planning to compare each name to the other 999 names to see which is closest? What if two names are equally close to a third name but not to each other, how do you decide which is better

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-03 Thread David G Johnston
On Wed, Dec 3, 2014 at 9:14 AM, pinker [via PostgreSQL] < ml-node+s1045698n5829111...@n5.nabble.com> wrote: > There is nice extension in postgres: fuzzystrmatch > I have > used to calculate the distance. From documetation: > > SELECT l

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-03 Thread David G Johnston
I play with it when I get a chance but you should at least try to code something. Dave On Wed, Dec 3, 2014 at 11:08 AM, mongoose [via PostgreSQL] < ml-node+s1045698n5829132...@n5.nabble.com> wrote: > David, > > Thanks for the useful feedback. Since I am not an experienced developer it > is too

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-03 Thread mongoose
David, Thanks for the useful feedback. Since I am not an experienced developer it is too complicated for me to come up with the queries. Besides I wonder if this is going to be efficient to do this processing on PostgreSQL. -- View this message in context: http://postgresql.nabble.com/Merge-ro

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-03 Thread pinker
There is nice extension in postgres: fuzzystrmatch I have used to calculate the distance. From documetation: SELECT levenshtein_less_equal('extensive', 'exhaustive',2); You can use it then with your group by query. -- View this

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-02 Thread David G Johnston
On Tuesday, December 2, 2014, mongoose [via PostgreSQL] < ml-node+s1045698n5829030...@n5.nabble.com> wrote: > David, > > Thank you for your prompt reply. I believe your answer helped a lot but it > seems I was not clear enough on my description. Basically I want a counter > (id) to show if two or

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-02 Thread mongoose
David, Thank you for your prompt reply. I believe your answer helped a lot but it seems I was not clear enough on my description. Basically I want a counter (id) to show if two or more names are similar (i.e. levenshtein distance less than 3) So in the previous example: >From this table: Name,

Re: [GENERAL] Merge rows based on Levenshtein distance

2014-12-01 Thread David G Johnston
mongoose wrote > I am new to PostgreSQL and I have the following table: > > Name, City > "Alex", "Washington" > "Aleex1", "Washington" > "Bob", "NYC" > "Booob", "NYC" > > I want to "merge" similar rows based on levenshtein distance between names > so that I have the following table: > > id, Name

[GENERAL] Merge rows based on Levenshtein distance

2014-12-01 Thread mongoose
I am new to PostgreSQL and I have the following table: Name, City "Alex", "Washington" "Aleex1", "Washington" "Bob", "NYC" "Booob", "NYC" I want to "merge" similar rows based on levenshtein distance between names so that I have the following table: id, Name, City 1,"Alex", "Washington" 1,"Aleex1