Re: [GENERAL] Merge rows based on Levenshtein distance

David G Johnston Wed, 03 Dec 2014 10:46:44 -0800

On Wed, Dec 3, 2014 at 9:14 AM, pinker [via PostgreSQL] <
ml-node+s1045698n5829111...@n5.nabble.com> wrote:


> There is nice extension in postgres: fuzzystrmatch
> <http://www.postgresql.org/docs/9.3/static/fuzzystrmatch.html> I have
> used to calculate the distance. From documetation:
>
> SELECT levenshtein_less_equal('extensive', 'exhaustive',2);
>
> You can use it then with your group by query.
>
>
Something like this - replace the substring(...) comparison with
legenshtein_less_equal(...) or whatever comparison you find applicable.

In the case below new groups are started whenever the first letter of the
value changes.

The first group would be NULL so I add a COALESCE() call to make it 0 -
subsequent groups start with 1 and increment properly.

WITH src (val) AS (
VALUES ('A1'::varchar),('A2'),('B1'),('B2'),('B3'),('C1'),('D1')
)
, grp AS (
SELECT val
, CASE WHEN
            substring(val,1,1) <> substring(lag(val) OVER (ORDER BY
val),1,1)
       THEN 1
       ELSE NULL
       END AS changed
, ROW_NUMBER() OVER (ORDER BY val) AS val_idx
FROM src
)
SELECT val, COALESCE(sum(changed) OVER (ORDER BY val_idx), 0) AS group_id
FROM grp
;

David J.




--
View this message in context: 
http://postgresql.nabble.com/Merge-rows-based-on-Levenshtein-distance-tp5828841p5829143.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: [GENERAL] Merge rows based on Levenshtein distance

Reply via email to