Re: [Question] Allocations along 64 byte cache lines

2021-09-09 Thread Jed Brown
Jorge Cardoso Leitão writes: > Yes, I expect aligned SIMD loads to be faster. > > My understanding is that we do not need an alignment requirement for this, > though: split the buffer in 3, [unaligned][aligned][unaligned], use aligned > loads for the middle and un-aligned (or not even SIMD) for t

Re: [Question] Allocations along 64 byte cache lines

2021-09-09 Thread Jorge Cardoso Leitão
Thanks Yibo, Yes, I expect aligned SIMD loads to be faster. My understanding is that we do not need an alignment requirement for this, though: split the buffer in 3, [unaligned][aligned][unaligned], use aligned loads for the middle and un-aligned (or not even SIMD) for the prefix and suffix. This

Re: [Question] Allocations along 64 byte cache lines

2021-09-07 Thread Yibo Cai
Thanks Jorge, I'm wondering if the 64 bytes alignment requirement is for cache or for simd register(avx512?). For simd, looks register width alignment does helps. E.g., _mm_load_si128 can only load 128 bits aligned data, it performs better than _mm_loadu_si128, which supports unaligned load.

Re: [Question] Allocations along 64 byte cache lines

2021-09-07 Thread Jorge Cardoso Leitão
Thanks, I think that the alignment requirement in IPC is different from this one: we enforce 8/64 byte alignment when serializing for IPC, but we (only) recommend 64 byte alignment in memory addresses (at least this is my understanding from the above link). I did test adding two arrays and the re

Re: [Question] Allocations along 64 byte cache lines

2021-09-06 Thread Yibo Cai
Did a quick bench of accessing long buffer not 8 bytes aligned. Giving enough conditions, looks it does shows unaligned access has some penalty over aligned access. But I don't think this is an issue in practice. Please be very skeptical to this benchmark. It's hard to get it right given the c

Re: [Question] Allocations along 64 byte cache lines

2021-09-06 Thread Micah Kornfield
> > My own impression is that the emphasis may be slightly exagerated. But > perhaps some other benchmarks would prove differently. This is probably true. [1] is the original mailing list discussion. I think lack of measurable differences and high overhead for 64 byte alignment was the reason f

Re: [Question] Allocations along 64 byte cache lines

2021-09-06 Thread Antoine Pitrou
Le 06/09/2021 à 23:20, Jorge Cardoso Leitão a écrit : Thanks a lot Antoine for the pointers. Much appreciated! Generally, it should not hurt to align allocations to 64 bytes anyway, since you are generally dealing with large enough data that the (small) memory overhead doesn't matter. Not f

Re: [Question] Allocations along 64 byte cache lines

2021-09-06 Thread Eduardo Ponce
To add to Antoine's points, besides data alignment being beneficial for reducing cache line reads/write and overall using the cache more effectively, another key point is when using vector (SIMD) registers. Although recent CPUs can load unaligned data to vector registers at similar speeds as aligne

Re: [Question] Allocations along 64 byte cache lines

2021-09-06 Thread Jorge Cardoso Leitão
Thanks a lot Antoine for the pointers. Much appreciated! Generally, it should not hurt to align allocations to 64 bytes anyway, > since you are generally dealing with large enough data that the > (small) memory overhead doesn't matter. > Not for performance. However, 64 byte alignment in Rust req

Re: [Question] Allocations along 64 byte cache lines

2021-09-06 Thread Antoine Pitrou
Le 06/09/2021 à 19:45, Antoine Pitrou a écrit : Specifically, I performed two types of tests, a "random sum" where we compute the sum of the values taken at random indices, and "sum", where we sum all values of the array (buffer[1] of the primitive array), both for array ranging from 2^10 to

Re: [Question] Allocations along 64 byte cache lines

2021-09-06 Thread Antoine Pitrou
On Mon, 6 Sep 2021 18:09:31 +0100 Jorge Cardoso Leitão wrote: > Hi, > > We have a whole section related to byte alignment ( > https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding) > recommending 64 byte alignment and referring to intel's manual. > > Do we have evidence

[Question] Allocations along 64 byte cache lines

2021-09-06 Thread Jorge Cardoso Leitão
Hi, We have a whole section related to byte alignment ( https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding) recommending 64 byte alignment and referring to intel's manual. Do we have evidence that this alignment helps (besides intel claims)? I am asking because going