On Wed, May 2, 2012 at 5:48 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Wed, May 2, 2012 at 9:35 AM, Alexander Korotkov <aekorot...@gmail.com> > wrote: > > Imagine we've two queries: > > 1) SELECT * FROM tbl WHERE col LIKE '%abcd%'; > > 2) SELECT * FROM tbl WHERE col LIKE '%abcdefghijk%'; > > > > The first query require reading posting lists of trigrams "abc" and > "bcd". > > The second query require reading posting lists of trigrams "abc", "bcd", > > "cde", "def", "efg", "fgh", "ghi", "hij" and "ijk". > > We could decide to use index scan for first query and sequential scan for > > second query because number of posting list to read is high. But it is > > unreasonable because actually second query is narrower than the first > one. > > We can use same index scan for it, recheck will remove all false > positives. > > When number of trigrams is high we can just exclude some of them from > index > > scan. It would be better than just decide to do sequential scan. But the > > question is what trigrams to exclude? Ideally we would leave most rare > > trigrams to make index scan cheaper. > > True. I guess I was thinking more of the case where you've got > abc|def|ghi|jkl|mno|pqr|stu|vwx|yza|.... There's probably some point > at which it becomes silly to think about using the index. Yes, such situations are also possible. >> Well, I'm not an expert on encodings, but it seems like a logical > >> extension of what we're doing right now, so I don't really see why > >> not. I'm confused by the diff hunks in pg_mule2wchar_with_len, > >> though. Presumably either the old code is right (in which case, don't > >> change it) or the new code is right (in which case, there's a bug fix > >> needed here that ought to be discussed and committed separately from > >> the rest of the patch). Maybe I am missing something. > > > > Unfortunately I didn't understand original logic > of pg_mule2wchar_with_len. > > I just did proposal about how it could be. I hope somebody more familiar > > with this code would clarify this situation. > > Well, do you think the current code is buggy, or not? Probably, but I'm not sure. The conversion seems lossy to me unless I'm missing something about mule encoding. ------ With best regards, Alexander Korotkov.