Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

Alexander Korotkov Wed, 02 May 2012 06:58:18 -0700

On Wed, May 2, 2012 at 5:48 PM, Robert Haas <[email protected]> wrote:


> On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov <[email protected]>
> wrote:
>  > Imagine we've two queries:
> > 1) SELECT * FROM tbl WHERE col LIKE '%abcd%';
> > 2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%';
> >
> > The first query require reading posting lists of trigrams "abc" and
> "bcd".
> > The second query require reading posting lists of trigrams "abc", "bcd",
> > "cde", "def", "efg", "fgh", "ghi", "hij" and "ijk".
> > We could decide to use index scan for first query and sequential scan for
> > second query because number of posting list to read is high. But it is
> > unreasonable because actually second query is narrower than the first
> one.
> > We can use same index scan for it, recheck will remove all false
> positives.
> > When number of trigrams is high we can just exclude some of them from
> index
> > scan. It would be better than just decide to do sequential scan. But the
> > question is what trigrams to exclude? Ideally we would leave most rare
> > trigrams to make index scan cheaper.
>
> True.  I guess I was thinking more of the case where you've got
> abc|def|ghi|jkl|mno|pqr|stu|vwx|yza|....  There's probably some point
> at which it becomes silly to think about using the index.


Yes, such situations are also possible.

 >> Well, I'm not an expert on encodings, but it seems like a logical
> >> extension of what we're doing right now, so I don't really see why
> >> not.  I'm confused by the diff hunks in pg_mule2wchar_with_len,
> >> though.  Presumably either the old code is right (in which case, don't
> >> change it) or the new code is right (in which case, there's a bug fix
> >> needed here that ought to be discussed and committed separately from
> >> the rest of the patch).  Maybe I am missing something.
> >
> > Unfortunately I didn't understand original logic
> of pg_mule2wchar_with_len.
> > I just did proposal about how it could be. I hope somebody more familiar
> > with this code would clarify this situation.
>
> Well, do you think the current code is buggy, or not?


Probably, but I'm not sure. The conversion seems lossy to me unless I'm
missing something about mule encoding.

------
With best regards,
Alexander Korotkov.

Re: [HACKERS] Patch: add conversion from pg_wchar to multibyte

Reply via email to