Re: AIX support

Heikki Linnakangas Mon, 07 Apr 2025 10:13:23 -0700

On 05/04/2025 21:29, Srirama Kucherlapati wrote:

- WRT to the MEMSET_LOOP_LIMIT flag, this is set to “0”, which wouldinternally use
Yes, I understand what it does. But why? Whatever benchmarking was doneback in 2006 by is no longer relevant.
We ran the program , mentioned in the below link and collected the
benchmark stats on our node (POWER_10).
https://postgrespro.com/list/thread-id/1673194 <https://postgrespro.com/list/thread-id/1673194>
The native AIX memset() seems to performs better. The benchmark seems tobe still relevant, so I think we should continue to use the existing optimization for
AIX.

At least it needs to be updated to match what MemSet() looks likenowadays. The changes may be just cosmetic, but better check. Shouldalso check the effect on MemSetAligned(). That might matter more forperformance in practice.

A third thing to check is the performance of MemSet() when the pointeris, in fact, aligned.

The other question is what do the results look like on other platforms?How much difference does the libc implementation make, vs. the compilerand CPU architecture? If the difference is related to compiler or CPUarchitecture, then this doesn't belong in the AIX template, butsomewhere else.

Below are the stats (64bit Object mode).

./memset-aix


         sizeof(int)  = 4
         sizeof(long) = 8

MemSet() uses 'long', so the int tests are not relevant. I have omittedthem below.

         memset by int (size=8) : 0.280301
         Loop by long (size=8) : 0.202650

         memset by int (size=16) : 0.280979
         Loop by long (size=16) : 0.246879

         memset by int (size=32) : 0.331691
         Loop by long (size=32) : 0.422261

Ok, MemSet() is faster with very small sizes, the crossover is somewherebetween 16 and 32 bytes.

I'm actually surprised the compiler doesn't replace the memset() callwith a few store instructions with these sizes.

         memset by int (size=1024) : 0.904048
         Loop by long (size=1024) : 24.149871


So with larger sizes, memset() wins hands down.

I'm surprised how big the difference is, because I actually expected thecompiler to detect the memory-zeroing loop and replace it with somefancy vector instructions (does powerpc have any?). Or a call tomemset(); I've seen compilers convert loops to memset() and vice versa.

My gut feeling is actually that we should remove the MemSet() macroaltogether and just use memset() everywhere. The compilers are muchbetter at optimizing it in year 2025 than they were back in 2002. I'dlove to see some rigorous benchmarks across different platforms andcompilers to demonstrate that, and then just get rid of MemSet().

MemSetAligned() might still be worth keeping. Sometimes we know that apiece of memory is aligned, but the compiler does not. But maybe eventhat should just assert and hint the compiler that the input is aligned,and then call memset().

If you'd like to help the community in general, if you could do somemore rigorous benchmarking along those lines, not just for AIX, andstart a new thread to discuss that, that'd be much appreciated. Thatwould be the best way to resolve this.

For the more narrow question of what should the AIX template do, thatcomes down to whether there's some *AIX-specific* performancedifference. The generated powerpc assembly code is presumably the sameon AIX and other operating systems, so it comes down to whether there'ssome big difference in AIX's memset() implementation vs. glibc's.

diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h
Why is this change needed?
Yes, I know we've been over this many times already. I still don'tunderstand why it's needed. The onus is on you to explain it adequately,in comments in the patch, so that I and others understand it. Or evenbetter, remove it if it's not necessary.
If you recall, we previously considered replacing this assembly codewith __sync_lock_test_and_set(). However, as you mentioned earlier,this should be handled in a separate patch. For now, I'll make anote and submit a separate patch for this later, as originally
planned. Below is the reference to older discussion.

Yes, I do recall. Please read again my comment above: this all needs tobe explained in comments in the code.


To be precise, I have these questions:

- Does GCC on AIX (still) use the IBM assembler?
- Does the IBM assembler still not understand the label syntax?
- Is there some other label syntax that would work on the IBM assembler?
- Is it possible to use the GNU assembler instead?

+# -blibpath must contain ALL directories where we should look for libraries
+libpath := $(shell echo $(subst -L,:,$(filter -L/%,$(LDFLAGS))) | sed -e's/ 
//g'):/usr/lib:/lib
Is this still sensible on modern AIX systems? What happens if you leaveit out?
This is required as it is looking for the possible non-defaultdirectories for the linker at the runtime. This is used along with
rpath. As suggested, I tested this by removing the libpath, but at
run time the linker is not able to find the dependent libraries pathas a result, the binaries are not getting loaded. After doing someresearch, AIX uses a stricter, more*explicit* approach. The runtime
linker expects to tell it exactly where to look using -blibpath.

Ok, some comments would be in order to explain that, maybe with links tothe relevant AIX documentation.


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: AIX support

Reply via email to