On 12/22/18, Heikki Linnakangas <hlinn...@iki.fi> wrote: > On 14/12/2018 20:20, John Naylor wrote: > I'm afraid that script doesn't work as a performance test. The > position() function is immutable, so the result gets cached in the plan > cache. All you're measuring is the speed to get the constant from the > plan cache :-(.
That makes perfect sense now. I should have been more skeptical about the small and medium sizes having similar times. :/ > I rewrote the same tests with a little C helper function, attached, to > fix that, and to eliminate any PL/pgSQL overhead. Thanks for that, I'll probably have occasion to do something like this for other tests. > You chose interesting characters for the UTF-8 test. The haystack is a > repeating pattern of byte sequence EC 99 84, and the needle is a > repeating pattern of EC 84 B1. In the 'long' test, the lengths in the > skip table are '2', '1' and '250'. But the search bounces between the > '2' and '1' cases, and never hits the byte that would allow it to skip > 250 bytes. Interesting case, I had not realized that that can happen. Me neither, that was unintentional. > But I don't think we need to put much weight on that, you could come up > with other scenarios where the current code has skip table collisions, too. Okay. > So overall, I think it's still a worthwhile tradeoff, given that that is > a worst case scenario. If you choose less pathological UTF-8 codepoints, > or there is no match or the match is closer to the beginning of the > string, the patch wins. On 12/23/18, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > So, what is the expected speedup in a "good/average" case? Do we have > some reasonable real-world workload mixing these cases that could be > used as a realistic benchmark? I'll investigate some "better" cases. -John Naylor