On Wed, 18 Dec 2024 at 23:42, John Naylor <johncnaylo...@gmail.com> wrote: > The difference is small enough that normally I'd say it's likely > unrelated to the patch, but on the other hand it's consistent with > saving (3 * 10 * 10 million) cycles because of 1 less multiplication > each, which is not nothing, but for shoving bytes into /dev/null it's > not exciting either. The lookup for the 64-bit case has grown to 1024 > bytes, which will compete for cache space. I don't have a strong > reason to be either for or against this patch. Anyone else want to > test?
I tried it out too on my Zen4 machine. I don't doubt David saw a speedup when testing the performance in isolation, but I can't detect anything going faster when using it in Postgres. Maybe we can revisit if we make COPY TO faster someday. As of today, it's a pretty inefficient lump of code. My results: $ echo master && ./intbench.sh master NOTICE: relation "tmp" already exists, skipping CREATE TABLE AS latency average = 246.294 ms latency average = 243.167 ms latency average = 245.620 ms latency average = 247.135 ms latency average = 248.206 ms latency average = 253.433 ms latency average = 259.296 ms latency average = 248.856 ms latency average = 247.518 ms latency average = 259.581 ms latency average = 244.426 ms latency average = 244.553 ms latency average = 249.909 ms latency average = 244.079 ms latency average = 246.422 ms latency average = 248.763 ms latency average = 247.318 ms latency average = 249.675 ms latency average = 245.192 ms latency average = 253.975 ms $ echo patched && ./intbench.sh patched NOTICE: relation "tmp" already exists, skipping CREATE TABLE AS latency average = 253.964 ms latency average = 257.463 ms latency average = 250.506 ms latency average = 252.401 ms latency average = 260.806 ms latency average = 250.120 ms latency average = 251.539 ms latency average = 262.180 ms latency average = 252.349 ms latency average = 251.332 ms latency average = 249.490 ms latency average = 252.696 ms latency average = 251.895 ms latency average = 248.466 ms latency average = 255.839 ms latency average = 253.334 ms latency average = 250.548 ms latency average = 288.164 ms latency average = 252.587 ms latency average = 256.059 ms perf top: master: 16.59% postgres [.] CopyAttributeOutText 15.63% libc.so.6 [.] __memmove_avx512_unaligned_erms 12.94% postgres [.] pg_ltoa 9.85% postgres [.] CopyOneRowTo 6.86% postgres [.] AllocSetAlloc 6.73% postgres [.] tts_buffer_heap_getsomeattrs patched 19.53% libc.so.6 [.] __memmove_avx512_unaligned_erms 12.52% postgres [.] pg_ltoa 11.76% postgres [.] CopyAttributeOutText 11.40% postgres [.] CopyOneRowTo 6.96% postgres [.] tts_buffer_heap_getsomeattrs 6.35% postgres [.] AllocSetAlloc I can't think of what we have that exercises pg_ltoa() or pg_ultoa_n() more. timestamp_out() might, but that's lots of small ints. David
#!/bin/bash dbname=postgres psql -c "create table if not exists tmp as select random(1,1e9) c1, random(1,1e9) c2,random(1,1e9) c3,random(1,1e9) c4,random(1,1e9) c5,random(1,1e9) c6,random(1,1e9) c7,random(1,1e9) c8,random(1,1e9) c9,random(1,1e9) c10 from generate_series(1,1000000);" $dbname for i in {1..20} do sleep 10 echo "copy tmp to '/dev/null'" > bench.sql pgbench -n -f bench.sql -T 10 $dbname | grep latency done;