On Wed, 15 Jan 2025 14:40:19 -0800 Jeff Davis <pg...@j-davis.com> wrote:
> My apologies, I sent the previous email prematurely. Let me try again: > > On Wed, 2025-01-15 at 14:34 -0800, Jeff Davis wrote: > > On Wed, 2025-01-15 at 01:40 +0900, Yugo NAGATA wrote: > > > > > For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' > > > > > AND > > > > > t < '123fop')" > > > > > and index scan can be used for this condition. On the other > > > > > hand, > > > > > "t ~~* '123foo'" > > > > > cannot be converted and sequential scan is used. > > > > > > > > > > Even in this case, we can use a bitmap index scan for the > > > > > condition > > > > > "(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')" > > > > > followed by > > > > > recheck by the original condition "t ~~* '123foo'", and this > > > > > could be faster > > > > > than seqscan. > > In theory, there could be many OR clauses: > > (t >= '123foo' AND t < '123fop') OR > (t >= '123Foo' AND t < '123Fop') OR > (t >= '123fOo' AND t < '123fOp') OR > (t >= '123FOo' AND t < '123FOp') OR > ... > > How should that be limited? Instead of generating complete patterns considering every case-varying characters, two clauses considering only the first case-varying character are generated. For example, for the condition "t ILIKE '123foo%'", the generated condition is "(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')". Rows meeting "(t >= '123f' AND t < '123g')" includes those whose "t" start with '123f', that is meeting the following; > (t >= '123foo' AND t < '123fop') OR > (t >= '123fOo' AND t < '123fOp') OR > ... , and rows meeting "(t >= '123F' AND t < '123G')" includes those whose "t" start with '123F', that is meeting the following; > (t >= '123Foo' AND t < '123Fop') OR > (t >= '123FOo' AND t < '123FOp') OR > ... It is required that the second case-varying character and later are checked after the index scan, and some rows are filtered, but it will be still faster than full sequential scan. Regards, Yugo Nagata -- Yugo NAGATA <nag...@sraoss.co.jp>