On Wed, Jun 19, 2019 at 8:51 PM Binguo Bao <djydew...@gmail.com> wrote: > [v4 patch]
Hi Binguo, I can verify I get no warnings with the v4 patch. I've done some additional performance testing. First, to sum up your results: > insert into detoast_c (a) select > repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from > generate_series(1,100); When the search pattern was at the beginning, the patch was several times faster , and when the pattern was at the end, it was 3% slower when uncompressed and 9% slower when compressed. First, I'd like to advocate for caution when using synthetic benchmarks involving compression. Consider this test: insert into detoast_c (a) select 'abc'|| repeat( (SELECT string_agg(md5(chr(i)), '') FROM generate_series(1,127) i) , 10000) ||'xyz' from generate_series(1,100); The results for the uncompressed case were not much different then your test. However, in the compressed case the iterator doesn't buy us much with beginning searches since full decompression is already fast: master patch comp. beg. 869ms 837ms comp. end 14100ms 16100ms uncomp. beg. 6360ms 800ms uncomp. end 21100ms 21400ms and with compression it's 14% slower searching to the end. This is pretty contrived, but I include it for demonstration. To test something hopefully a bit more realistic, I loaded 100 records each containing the 1995 CIA fact book (~3MB of ascii) with a pattern string put at the beginning and end. For the end search, I used a longer needle to speed up the consumption of text, hoping to put more stress on the detoasting algorithms, for example: select max(position('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' in a)) from detoast_*; comp. beg. 836ms 22ms comp. end 1510ms 1700ms uncomp. beg. 185ms 12ms uncomp. end 851ms 903ms Here, the "beginning" case is ~15-35x faster, which is very impressive and much faster than with your generated contents. The "end" case is up to 13% slower. It would be interesting to see where the break-even point is, where the results are the same. Reading the thread where you're working on optimizing partial decompression [1], it seems you have two separate solutions for the two problems. Maybe this is fine, but I'd like to bring up the possibility of using the same approach for both kinds of callers. I'm not an expert on TOAST, but maybe one way to solve both problems is to work at the level of whole TOAST chunks. In that case, the current patch would look like this: 1. The caller requests more of the attribute value from the de-TOAST iterator. 2. The iterator gets the next chunk and either copies or decompresses the whole chunk into the buffer. (If inline, just decompress the whole thing) This seems simpler and also easy to adapt to callers that do know how big a slice they want. I also suspect this way would be easier to adapt to future TOAST formats not tied to heap or to a certain compression algorithm. With less bookkeepping overhead, maybe there'll be less worst-case performance degradation, while not giving up much in the best case. (Note also that commit 9556aa01c6 already introduced some performance degradation in near-end searches, when using multibyte strings. This patch would add to that.) The regression doesn't seem large, but I see more than your test showed, and it would be nice to avoid it. Thoughts, anyone? [1] https://www.postgresql.org/message-id/flat/CAL-OGkux7%2BBm_J%3Dt5VpH7fJGGSm%2BPxWJtgs1%2BWU2g6cmLru%3D%3DA%40mail.gmail.com#705d074aa4ae305ed3d992b7e5b7af3c -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services