On Mon, 18 Nov 2024 15:01:09 GMT, Emanuel Peter <epe...@openjdk.org> wrote:
>> @eme64 If you load a 32-byte (256-bit) vector, then the load is aligned if >> the address is divisible by 32, otherwise the load is misaligned. That's why >> [`vmovdqua`](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) >> requires 16-byte alignment for 16-byte loads/stores, 32-byte alignment for >> 32-byte loads/stores, 64-byte alignment for 64-byte loads/stores. >> >> As a result, I don't see how you can align a vector load/store if the object >> base is only guaranteed to align at 8-byte boundaries. I mean there is no >> use trying to align an access if you cannot align it at the access size, the >> access is going to be misaligned anyway. > > @merykitty I guess we can always use > [vmovdqu](https://www.felixcloutier.com/x86/movdqu:vmovdqu8:vmovdqu16:vmovdqu32:vmovdqu64). > > And in fact that is exactly what we do: > > public class Test { > static int RANGE = 1024*1024; > > public static void main(String[] args) { > byte[] aB = new byte[RANGE]; > byte[] bB = new byte[RANGE]; > for (int i = 0; i < 100_000; i++) { > test1(aB, bB); > } > } > > static void test1(byte[] a, byte[] b) { > > for (int i = 0; i < RANGE; i++) { > > a[i] = b[i]; > } > > } > } > > `../java -XX:CompileCommand=compileonly,Test::test* > -XX:CompileCommand=printcompilation,Test::test* -XX:+TraceLoopOpts > -XX:-TraceSuperWord -XX:+TraceNewVectors -Xbatch -XX:+AlignVector > -XX:CompileCommand=compileonly,Test::test* > -XX:CompileCommand=printassembly,Test::test* Test.java` > > > ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner main of > N178 strip mined) Freq: 8.13586e+09 > 0x00007fc3a4bb0780: movslq %ebx,%rdi > 0x00007fc3a4bb0783: movslq %ebx,%r14 > 0x00007fc3a4bb0786: vmovdqu32 0x10(%r13,%r14,1),%zmm1 > 0x00007fc3a4bb0791: vmovdqu32 %zmm1,0x10(%r9,%r14,1) > 0x00007fc3a4bb079c: vmovdqu32 0x50(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07a7: vmovdqu32 %zmm1,0x50(%r9,%rdi,1) > 0x00007fc3a4bb07b2: vmovdqu32 0x90(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07bd: vmovdqu32 %zmm1,0x90(%r9,%rdi,1) > 0x00007fc3a4bb07c8: vmovdqu32 0xd0(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07d3: vmovdqu32 %zmm1,0xd0(%r9,%rdi,1) > 0x00007fc3a4bb07de: vmovdqu32 0x110(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07e9: vmovdqu32 %zmm1,0x110(%r9,%rdi,1) > 0x00007fc3a4bb07f4: vmovdqu32 0x150(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07ff: vmovdqu32 %zmm1,0x150(%r9,%rdi,1) > 0x00007fc3a4bb080a: vmovdqu32 0x190(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb0815: vmovdqu32 %zmm1,0x190(%r9,%rdi,1) > 0x00007fc3a4bb0820: vmovdqu32 0x1d0(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb082b: vmovdqu32 %zmm1,0x1d0(%r9,%rdi,1) ;*bastore > {reexecute=0 rethrow=0 return_oop=0} > ; - > Test::test1@14 (line 14) > 0x00007fc3a4bb0836: add $0x200,%ebx ;*iinc > {reexecute=0 rethrow=0 return_oop=0} > ; - > Test::test1@15 (line 13) > 0x00007fc3a4bb083c: c... @eme64 What I mean here is that `AlignVector` seems useless because the accesses are going to be misaligned either way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483356306