On Friday, 17 June 2016 09:23:18 UTC+7, Keith Randall wrote:
>
> Looks like something is wrong with immediate loading for the 1 << ... 
> operation.  Could you open a bug with repro instructions?  I can look at it 
> when 1.8 opens.
>>
>>
@Keith, now that I have found a combination that makes the original loop 
compile to an efficient assembler loop (avoiding the many discrepencies 
with the compiler), I have started working to make it even faster using 
extreme loop unrolling with unsafe.Pointer.  The code is working, but I am 
finding even more discrepencies, as follows:

1) 'Even when the Go code is written exactly that way as a 
read;/modify/write instruction using the "|=" operator, the compiler 
refuses to generate the "ORL src, dstregWithIndexing" as in "ORL 
$2,(R12,R11*4)" or "ORL $2,$3(R12,R11*4)", which would be much more 
efficient.  Currently in golang version 1.7beta2, it uses a read 
instruction to a register, followed by the modification to that register, 
following by a write to the same destination; this is wasteful of a 
register and instructions.

2) Even when the 'R12' base index register is already in a register and 
isn't modified by the operation, the compiler persists in making a copy of 
the register to another register to use as the base index register; this is 
wasteful of registers and instructions.  I understood that SAR would mean 
that the compiler could regognize when a register is not changed and use it 
wherever appropriate.

3) When there is a lengthening conversion just outside a loop (or any kind 
of quick calculation for that matter), instead of making sure that 
conversion stays lifted outside the loop, the compiler will use a register 
to is a MOVXZX instruction inside the loop, which is wasteful of registers 
and instructions.

4) While there are a few cases where using the LEA instruction can combine 
operations and save time, the compiler overuses it to do simple 
calculations for immediately outside the loop to within the loop, and for 
complex index register operations where a simple ADDX is all that is 
necessary and is faster.

Should I report another issue against 1.8 for this?  If so, should I report 
one issue or combine them all as a tight loop performance issue (seemingly 
not really related to using unsafe.Pointer, but possibly related for some 
the of points (as in assigning a special register to use as the base index 
pointer in 1. above)?


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to