On 28.08.2012 20:30, Tom Lane wrote:
Heikki Linnakangas<heikki.linnakan...@enterprisedb.com> writes:
Drilling into the profile, I came up with three little optimizations:
1. Within spgdoinsert, a significant portion of the CPU time is spent on
line 2033 in spgdoinsert.c:
memset(&out, 0, sizeof(out));
That zeroes out a small struct allocated in the stack. Replacing that
with MemSet() makes it faster, reducing the time spent on zeroing that
struct from 10% to 1.5% of the time spent in spgdoinsert(). That's not
very much in the big scheme of things, but it's a trivial change so
seems worth it.
Fascinating. I'd been of the opinion that modern compilers would inline
memset() for themselves and MemSet was probably not better than what the
compiler could do these days. What platform are you testing on?
x64, gcc 4.7.1, running Debian.
The assembly generated for the MemSet is:
.loc 1 2033 0 discriminator 3
movq $0, -432(%rbp)
.LVL166:
movq $0, -424(%rbp)
.LVL167:
movq $0, -416(%rbp)
.LVL168:
movq $0, -408(%rbp)
.LVL169:
movq $0, -400(%rbp)
.LVL170:
movq $0, -392(%rbp)
while the corresponding memset code is:
.loc 1 2040 0 discriminator 6
xorl %eax, %eax
.loc 1 2042 0 discriminator 6
cmpb $0, -669(%rbp)
.loc 1 2040 0 discriminator 6
movq -584(%rbp), %rdi
movl $6, %ecx
rep stosq
In fact, with -mstringop=unrolled_loop, I can coerce gcc to produce code
similar to the MemSet version:
movq %rax, -440(%rbp)
.loc 1 2040 0 discriminator 6
xorl %eax, %eax
.L254:
movl %eax, %edx
addl $32, %eax
cmpl $32, %eax
movq $0, -432(%rbp,%rdx)
movq $0, -424(%rbp,%rdx)
movq $0, -416(%rbp,%rdx)
movq $0, -408(%rbp,%rdx)
jb .L254
leaq -432(%rbp), %r9
addq %r9, %rax
.loc 1 2042 0 discriminator 6
cmpb $0, -665(%rbp)
.loc 1 2040 0 discriminator 6
movq $0, (%rax)
movq $0, 8(%rax)
I'm not sure why gcc doesn't choose that by default. Perhaps it's CPU
specific which variant is faster - I was quite surprised that MemSet was
such a clear win on my laptop. Or maybe it's a speed-space tradeoff, and
gcc chooses the more compact version, although using -O3 instead of -O2
made no difference.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers