Re: [HACKERS] SP-GiST micro-optimizations

Heikki Linnakangas Wed, 29 Aug 2012 01:29:11 -0700

On 28.08.2012 22:50, Ants Aasma wrote:

On Tue, Aug 28, 2012 at 9:42 PM, Tom Lane<[email protected]>  wrote:

Seems like that's down to the CPU not doing "rep stosq" particularly
quickly, which might well be chip-specific.


AMD optimization manual[1] states the following:

     For repeat counts of less than 4k, expand REP string instructions
into equivalent sequences of simple
AMD64 instructions.

Intel optimization manual[2] doesn't provide equivalent guidelines,
but the graph associated with string instructions states about 30
cycles of startup latency. The mov based code on the other hand
executes in 6 cycles and can easily overlap with other non-store
instructions.

[1] http://support.amd.com/us/Processor_TechDocs/25112.PDF
[2] 
http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf

Hmm, sounds like gcc just isn't doing a very good job then. I also triedreplacing the memset with variable initialization: "spgChooseOut out = {0 }" (and I moved that to where the memset was). In that case, gccproduced the same (fast) sequence of movq's I got with-mstringop=unrolled_loop.

Out of curiosity, I also tried this on clang. It produced this,regardless of whether I used MemSet or memset or variable initializer:


        pxor    %xmm0, %xmm0
        .loc    1 2040 4                # spgdoinsert.c:2040:4
        movaps  %xmm0, -1280(%rbp)
        movaps  %xmm0, -1296(%rbp)
        movaps  %xmm0, -1312(%rbp)

So, it's using movaps to clear it in 16-byte chunks. perf annotate showsthat that's comparable in speed to the gcc's code produced for MemSet.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SP-GiST micro-optimizations

Reply via email to