Re: [HACKERS] pg_trgm version 1.2

Alexander Korotkov Mon, 20 Jul 2015 07:18:54 -0700

On Wed, Jul 15, 2015 at 12:31 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:


> On Tue, Jul 7, 2015 at 6:33 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>>
>>
>> See Tom Lane's comment about downgrade scripts. I think just remove it is
>> a right solution.
>>
>
> The new patch removes the downgrade path and the ability to install the
> old version.
>
> (If anyone wants an easy downgrade path for testing, they can keep using
> the prior patch--no functional changes)
>
> It also added a comment before the trigramsMatchGraph call.
>
> I retained the palloc and the loop to promote the ternary array to a
> binary array.  While I also think it is tempting to get rid of that by
> abusing the type system and would do it that way in my own standalone code,
> it seems contrary to way the project usually does things.  And I couldn't
> measure a difference in performance.
>

Ok!


> ....
>
>
>
>> Let's consider '^(?!.*def).*abc' regular expression as an example. It
>> matches strings which contains 'abc' and don't contains 'def'.
>>
>> # SELECT 'abc' ~ '^(?!.*def).*abc';
>>  ?column?
>> ----------
>>  t
>> (1 row)
>>
>> # SELECT 'def abc' ~ '^(?!.*def).*abc';
>>  ?column?
>> ----------
>>  f
>> (1 row)
>>
>> # SELECT 'abc def' ~ '^(?!.*def).*abc';
>>  ?column?
>> ----------
>>  f
>> (1 row)
>>
>> Theoretically, our trigram regex processing could support negative
>> matching of 'def' trigram, i.e. trigramsMatchGraph(abc = true, def = false)
>> = true but trigramsMatchGraph(abc = true, def = true) = false. Actually, it
>> doesn't because trigramsMatchGraph() implements a monotonic function. I
>> just think it should be stated explicitly.
>>
>
> Do you think it is likely to change to stop being monotonic and so support
> the (def=GIN_TRUE) => false case?
>
> ^(?!.*def)   seems like a profoundly niche situation.  (Although one that
> I might actually start using myself now that I know it isn't just a
> Perl-ism).
>
> It doesn't make any difference to this patch, other than perhaps how to
> word the comments.
>

Yes, it was just about the comments.

I also run few tests on real-life dataset: set of dblp paper titles. As it
was expected, there is huge speedup when pattern is long enough.

# SELECT count(*) FROM dblp_titles WHERE s ILIKE '%Multidimensional Data%';
 count
-------
   318
(1 row)

Time: 29,114 ms

# SELECT count(*) FROM dblp_titles WHERE s ~* '.*Multidimensional Data.*';
 count
-------
   318
(1 row)

Time: 26,273 ms

# SELECT count(*) FROM dblp_titles WHERE s ILIKE '%Multidimensional Data%';
 count
-------
   318
(1 row)

Time: 249,725 ms

# SELECT count(*) FROM dblp_titles WHERE s ~* '.*Multidimensional Data.*';
 count
-------
   318
(1 row)

Time: 218,627 ms

For me, patch is in the pretty good shape. I'm going to mark it "Ready for
committer".

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] pg_trgm version 1.2

Reply via email to