On Friday, 17 June 2016 09:23:18 UTC+7, Keith Randall wrote: > > Looks like something is wrong with immediate loading for the 1 << ... > operation. Could you open a bug with repro instructions? I can look at it > when 1.8 opens. >> >> @Keith, now that I have found a combination that makes the original loop compile to an efficient assembler loop (avoiding the many discrepencies with the compiler), I have started working to make it even faster using extreme loop unrolling with unsafe.Pointer. The code is working, but I am finding even more discrepencies, as follows:
1) 'Even when the Go code is written exactly that way as a read;/modify/write instruction using the "|=" operator, the compiler refuses to generate the "ORL src, dstregWithIndexing" as in "ORL $2,(R12,R11*4)" or "ORL $2,$3(R12,R11*4)", which would be much more efficient. Currently in golang version 1.7beta2, it uses a read instruction to a register, followed by the modification to that register, following by a write to the same destination; this is wasteful of a register and instructions. 2) Even when the 'R12' base index register is already in a register and isn't modified by the operation, the compiler persists in making a copy of the register to another register to use as the base index register; this is wasteful of registers and instructions. I understood that SAR would mean that the compiler could regognize when a register is not changed and use it wherever appropriate. 3) When there is a lengthening conversion just outside a loop (or any kind of quick calculation for that matter), instead of making sure that conversion stays lifted outside the loop, the compiler will use a register to is a MOVXZX instruction inside the loop, which is wasteful of registers and instructions. 4) While there are a few cases where using the LEA instruction can combine operations and save time, the compiler overuses it to do simple calculations for immediately outside the loop to within the loop, and for complex index register operations where a simple ADDX is all that is necessary and is faster. Should I report another issue against 1.8 for this? If so, should I report one issue or combine them all as a tight loop performance issue (seemingly not really related to using unsafe.Pointer, but possibly related for some the of points (as in assigning a special register to use as the base index pointer in 1. above)? -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.