On Sun, Dec 23, 2018 at 9:33 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > So, what is the expected speedup in a "good/average" case? Do we have > some reasonable real-world workload mixing these cases that could be > used as a realistic benchmark?
Not sure about a realistic mix, but I investigated the tradeoffs. First, using the test function Heikki posted upthread [1], I tried a 2 character needle needle with different haystack sizes, and looked for the position where master and patch were roughly equal (average of 3, within 1%). Beyond this point, the patch regresses. To keep it simple I used UTF-8 only. haystack position <=23 (patch always faster) 30 23 100 58 1000 ~550 1000000 ~550000 For very short haystacks, the patch is always faster or regresses only when the needle is close to the end. For longer haystacks, the patch will be faster if the position searched for is less than ~55% of the way to the end, slower if after. Next, I tested finding the position of a 2 character needle in a couple positions towards the front of a 1000 character haystack, plus not present and at the end for comparison. As seen earlier [1], the worst-case regression for this haystack length was 4.4x slower for UTF-8, but that had a skip table collision, and this time I used characters with no bytes in common. Average of 3, with a loop count of 1,000,000: UTF-8 pos. master patch diff * 3880ms 144ms -96% 100 4440ms 1410ms -68% 333 5700ms 4280ms -25% 999 9370ms 12500ms 34% EUC-KR pos. master patch diff * 3900ms 112ms -97% 100 4410ms 1289ms -71% 333 5680ms 3970ms -30% 999 9370ms 11600ms 24% The patch is much faster than master when the needle is near the front of a large haystack or not there at all. The majority of cases are measurably faster, and the best case is at least 20x faster. On the whole I'd say this patch is a performance win even without further optimization. I'm marking it ready for committer. [1] https://www.postgresql.org/message-id/05d95c5c-1fb7-29ea-1c5d-e7e941a0a14d%40iki.fi -- -John Naylor