On 05/04/2025 21:29, Srirama Kucherlapati wrote:
- WRT to the MEMSET_LOOP_LIMIT flag, this is set to “0”, which would
internally use
Yes, I understand what it does. But why? Whatever benchmarking was done
back in 2006 by is no longer relevant.
We ran the program , mentioned in the below link and collected the
benchmark stats on our node (POWER_10).
https://postgrespro.com/list/thread-id/1673194 <https://postgrespro.com/
list/thread-id/1673194>
The native AIX memset() seems to performs better. The benchmark seems to
be still relevant, so I think we should continue to use the existing optimization for
AIX.
At least it needs to be updated to match what MemSet() looks like
nowadays. The changes may be just cosmetic, but better check. Should
also check the effect on MemSetAligned(). That might matter more for
performance in practice.
A third thing to check is the performance of MemSet() when the pointer
is, in fact, aligned.
The other question is what do the results look like on other platforms?
How much difference does the libc implementation make, vs. the compiler
and CPU architecture? If the difference is related to compiler or CPU
architecture, then this doesn't belong in the AIX template, but
somewhere else.
Below are the stats (64bit Object mode).
./memset-aix
sizeof(int) = 4
sizeof(long) = 8
MemSet() uses 'long', so the int tests are not relevant. I have omitted
them below.
memset by int (size=8) : 0.280301
Loop by long (size=8) : 0.202650
memset by int (size=16) : 0.280979
Loop by long (size=16) : 0.246879
memset by int (size=32) : 0.331691
Loop by long (size=32) : 0.422261
Ok, MemSet() is faster with very small sizes, the crossover is somewhere
between 16 and 32 bytes.
I'm actually surprised the compiler doesn't replace the memset() call
with a few store instructions with these sizes.
memset by int (size=1024) : 0.904048
Loop by long (size=1024) : 24.149871
So with larger sizes, memset() wins hands down.
I'm surprised how big the difference is, because I actually expected the
compiler to detect the memory-zeroing loop and replace it with some
fancy vector instructions (does powerpc have any?). Or a call to
memset(); I've seen compilers convert loops to memset() and vice versa.
My gut feeling is actually that we should remove the MemSet() macro
altogether and just use memset() everywhere. The compilers are much
better at optimizing it in year 2025 than they were back in 2002. I'd
love to see some rigorous benchmarks across different platforms and
compilers to demonstrate that, and then just get rid of MemSet().
MemSetAligned() might still be worth keeping. Sometimes we know that a
piece of memory is aligned, but the compiler does not. But maybe even
that should just assert and hint the compiler that the input is aligned,
and then call memset().
If you'd like to help the community in general, if you could do some
more rigorous benchmarking along those lines, not just for AIX, and
start a new thread to discuss that, that'd be much appreciated. That
would be the best way to resolve this.
For the more narrow question of what should the AIX template do, that
comes down to whether there's some *AIX-specific* performance
difference. The generated powerpc assembly code is presumably the same
on AIX and other operating systems, so it comes down to whether there's
some big difference in AIX's memset() implementation vs. glibc's.
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
Why is this change needed?
Yes, I know we've been over this many times already. I still don't
understand why it's needed. The onus is on you to explain it adequately,
in comments in the patch, so that I and others understand it. Or even
better, remove it if it's not necessary.
If you recall, we previously considered replacing this assembly code
with __sync_lock_test_and_set(). However, as you mentioned earlier,
this should be handled in a separate patch. For now, I'll make a
note and submit a separate patch for this later, as originally
planned. Below is the reference to older discussion.
Yes, I do recall. Please read again my comment above: this all needs to
be explained in comments in the code.
To be precise, I have these questions:
- Does GCC on AIX (still) use the IBM assembler?
- Does the IBM assembler still not understand the label syntax?
- Is there some other label syntax that would work on the IBM assembler?
- Is it possible to use the GNU assembler instead?
+# -blibpath must contain ALL directories where we should look for libraries
+libpath := $(shell echo $(subst -L,:,$(filter -L/%,$(LDFLAGS))) | sed -e's/
//g'):/usr/lib:/lib
Is this still sensible on modern AIX systems? What happens if you leave
it out?
This is required as it is looking for the possible non-default
directories for the linker at the runtime. This is used along with
rpath. As suggested, I tested this by removing the libpath, but at
run time the linker is not able to find the dependent libraries path
as a result, the binaries are not getting loaded. After doing some
research, AIX uses a stricter, more*explicit* approach. The runtime
linker expects to tell it exactly where to look using -blibpath.
Ok, some comments would be in order to explain that, maybe with links to
the relevant AIX documentation.
--
Heikki Linnakangas
Neon (https://neon.tech)