.On Tue, May 11, 2021 at 11:31 PM Bruce Momjian <br...@momjian.us> wrote: > On Tue, May 11, 2021 at 01:16:38PM +0300, Alexander Korotkov wrote: > > > OK, what symbols trigger this change? Underscore? What else? > > > > Any symbol, which is recognized as a separator by full-text parser, > > but not tsquery parser. Fulltext search is extensible and allowing > > pluggable parsers. In principle, we could dig the exact set of > > symbols, but I'm not sure this worth the effort. > > > > > You are > > > saying the previous code allowed 'pg' and 'class' anywhere in the > > > string, while the new code requires them to be adjacent, which more > > > closely matches the pattern. > > > > Yes, that's it. > > > > > > * Fix extra distance in phrase operators for quoted text in > > > > websearch_to_tsquery() (Alexander Korotkov) > > > > For example, websearch_to_tsquery('english', '"aaa: bbb"') becomes > > > > 'aaa <> bbb' instead of 'aaa <2> bbb'. > > > > > > So colon and space were considered to be two tokens between 'aaa' and > > > 'bbb', while is really only one because both tokens are discarded? Is > > > this true of any discarded tokens, e.g. ''"aaa ?:, bbb"'? > > > > Yes, that's true for any discarded tokens. > > I can up with this text for these two items. I think it still needs ro > be more specific: > > <listitem> > <!-- > Author: Alexander Korotkov <akorot...@postgresql.org> > 2021-01-31 [0c4f355c6] Fix parsing of complex morphs to tsquery > --> > > <para> > Fix to_tsquery() and websearch_to_tsquery() to properly parse > certain discarded tokens in quotes (Alexander Korotkov) > </para>
This relates not just to quotes. Original problem relates to quotes in websearch_to_tsquery() and phrase operator in to_tsquery(). But the solution changes output for all query operands containing discarded tokens. Could we try this? Make to_tsquery() and websearch_to_tsquery() produce more strict output for query parts containing discarded tokens. In particular, this makes to_tsquery() and websearch_to_tsquery() properly parse the discarded tokens in phrase search operands and quotes correspondingly. > <para> > Certain discarded tokens, like underscore, caused the output > of these functions to produce incorrect tsquery output, e.g., > websearch_to_tsquery('"pg_class pg"') used to output '( pg & > class ) <-> pg', but now outputs 'pg <-> class <-> pg'. > </para> > </listitem> This part looks good to me. I'd just suggest to extend the example to to_tsquery() as well. Certain discarded tokens, like underscore, caused the output of these functions to produce incorrect tsquery output, e.g., both websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <-> pg') used to output '( pg & class ) <-> pg', but now both output 'pg <-> class <-> pg'. > <listitem> > <!-- > Author: Alexander Korotkov <akorot...@postgresql.org> > 2021-05-03 [eb086056f] Make websearch_to_tsquery() parse text in > quotes as a si > --> > > <para> > Fix websearch_to_tsquery() to properly parse multiple adjacent > discarded tokens in quotes (Alexander Korotkov) > </para> > > <para> > Previously, quoted text that contained multiple adjacent discarded > tokens were treated as multiple tokens, causing incorrect tsquery > output, e.g., websearch_to_tsquery('"aaa: bbb"') used to output > 'aaa <2> bbb', but now outputs 'aaa <-> bbb'. > </para> > </listitem> This item looks good to me. ------ Regards, Alexander Korotkov