On 29.01.2016 18:39, Alvaro Herrera wrote:
Teodor Sigaev wrote:
The behavior of this function is surprising to me.

select substring_similarity('dog' ,  'hotdogpound') ;

  substring_similarity
----------------------
                  0.25

Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
  substring_similarity
----------------------
                  0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
  substring_similarity
----------------------
                     1

Hmm, this behavior looks too much like magic to me.  I mean, a substring
is a substring -- why are we treating the space as a special character
here?


I think, I can rename this function to subword_similarity() and correct the documentation.

The current behavior is developed to find most similar word in a text. For example, if we will search just substring (not word) then we will get the following result:

select substring_similarity('dog', 'dogmatist');
 substring_similarity
---------------------
                    1
(1 row)

But this is wrong I think. They are completely different words.

For searching a similar substring (not word) in a text maybe another function should be added?

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to