Hi! > 12 марта 2019 г., в 10:22, Andrey Borodin <x4...@yandex-team.ru> написал(а): > > 3. And I'd use memmove despite the comment why we do not do that. It is > SSE-optimized and cache-optimized nowadays.
So, I've pushed idea a little bit and showed that decompress byte-copy cycle to Vladimir Leskov. while (len--) { *dp = dp[-off]; dp++; } He advised me to use algorithm that splits copied regions into smaller non-overlapping subregions with exponentially increasing size. while (off <= len) { memcpy(dp, dp - off, off); len -= off; dp += off; off *= 2; } memcpy(dp, dp - off, len); On original Paul's test without patch of this thread this optimization gave about x2.5 speedup. I've composed more detailed tests[0] and tested against current master. Now it only gives 20%-25% of decompression speedup, but I think it is still useful. Best regards, Andrey Borodin. [0] Here's the test create table if not exists slicingtest1 as select repeat('0', 10000) as a from generate_series(1,10000); create table if not exists slicingtest2 as select repeat('01', 10000) as a from generate_series(1,10000); create table if not exists slicingtest3 as select repeat('012', 10000) as a from generate_series(1,10000); create table if not exists slicingtest4 as select repeat('0123', 10000) as a from generate_series(1,10000); create table if not exists slicingtest5 as select repeat('01234', 10000) as a from generate_series(1,10000); create table if not exists slicingtest6 as select repeat('012345', 10000) as a from generate_series(1,10000); create table if not exists slicingtest7 as select repeat('0123456', 10000) as a from generate_series(1,10000); create table if not exists slicingtest8 as select repeat('01234567', 10000) as a from generate_series(1,10000); create table if not exists slicingtest16 as select repeat('0123456789ABCDEF', 10000) as a from generate_series(1,10000); create table if not exists slicingtest32 as select repeat('0x1x2x3x4x5x6x7x8x9xAxBxCxDxExFx', 10000) as a from generate_series(1,10000); create table if not exists slicingtest64 as select repeat('0xyz1xyz2xyz3xyz4xyz5xyz6xyz7xyz8xyz9xyzAxyzBxyzCxyzDxyzExyzFxyz', 10000) as a from generate_series(1,10000); \timing off select sum(length(a)) from slicingtest1; -- do for every stride lenght \timing on select sum(length(a)) from slicingtest1;
0001-Use-fast-memcpy-in-pglz-decompression.patch
Description: Binary data