On Wed, 18 Dec 2024 at 23:42, John Naylor <johncnaylo...@gmail.com> wrote:
> The difference is small enough that normally I'd say it's likely
> unrelated to the patch, but on the other hand it's consistent with
> saving (3 * 10 * 10 million) cycles because of 1 less multiplication
> each, which is not nothing, but for shoving bytes into /dev/null it's
> not exciting either. The lookup for the 64-bit case has grown to 1024
> bytes, which will compete for cache space. I don't have a strong
> reason to be either for or against this patch. Anyone else want to
> test?

I tried it out too on my Zen4 machine. I don't doubt David saw a
speedup when testing the performance in isolation, but I can't detect
anything going faster when using it in Postgres.

Maybe we can revisit if we make COPY TO faster someday. As of today,
it's a pretty inefficient lump of code.

My results:

$ echo master && ./intbench.sh
master
NOTICE:  relation "tmp" already exists, skipping
CREATE TABLE AS
latency average = 246.294 ms
latency average = 243.167 ms
latency average = 245.620 ms
latency average = 247.135 ms
latency average = 248.206 ms
latency average = 253.433 ms
latency average = 259.296 ms
latency average = 248.856 ms
latency average = 247.518 ms
latency average = 259.581 ms
latency average = 244.426 ms
latency average = 244.553 ms
latency average = 249.909 ms
latency average = 244.079 ms
latency average = 246.422 ms
latency average = 248.763 ms
latency average = 247.318 ms
latency average = 249.675 ms
latency average = 245.192 ms
latency average = 253.975 ms

$ echo patched && ./intbench.sh
patched
NOTICE:  relation "tmp" already exists, skipping
CREATE TABLE AS
latency average = 253.964 ms
latency average = 257.463 ms
latency average = 250.506 ms
latency average = 252.401 ms
latency average = 260.806 ms
latency average = 250.120 ms
latency average = 251.539 ms
latency average = 262.180 ms
latency average = 252.349 ms
latency average = 251.332 ms
latency average = 249.490 ms
latency average = 252.696 ms
latency average = 251.895 ms
latency average = 248.466 ms
latency average = 255.839 ms
latency average = 253.334 ms
latency average = 250.548 ms
latency average = 288.164 ms
latency average = 252.587 ms
latency average = 256.059 ms

perf top:

master:
    16.59%  postgres       [.] CopyAttributeOutText
    15.63%  libc.so.6      [.] __memmove_avx512_unaligned_erms
    12.94%  postgres       [.] pg_ltoa
     9.85%  postgres       [.] CopyOneRowTo
     6.86%  postgres       [.] AllocSetAlloc
     6.73%  postgres       [.] tts_buffer_heap_getsomeattrs

patched
    19.53%  libc.so.6      [.] __memmove_avx512_unaligned_erms
    12.52%  postgres       [.] pg_ltoa
    11.76%  postgres       [.] CopyAttributeOutText
    11.40%  postgres       [.] CopyOneRowTo
     6.96%  postgres       [.] tts_buffer_heap_getsomeattrs
     6.35%  postgres       [.] AllocSetAlloc

I can't think of what we have that exercises pg_ltoa() or pg_ultoa_n()
more.  timestamp_out() might, but that's lots of small ints.

David
#!/bin/bash
dbname=postgres

psql -c "create table if not exists tmp as select random(1,1e9) c1, 
random(1,1e9) c2,random(1,1e9) c3,random(1,1e9) c4,random(1,1e9) 
c5,random(1,1e9) c6,random(1,1e9) c7,random(1,1e9) c8,random(1,1e9) 
c9,random(1,1e9) c10 from generate_series(1,1000000);" $dbname

for i in {1..20}
do
        sleep 10 
        echo "copy tmp to '/dev/null'" > bench.sql
        pgbench -n -f bench.sql -T 10 $dbname | grep latency
done;

Reply via email to