hi Gonzalo, This is interesting, thank you. Do you have code available to reproduce these results?
- Wes On Fri, Sep 15, 2017 at 9:28 AM, Gonzalo Ortiz Jaureguizar < golthir...@gmail.com> wrote: > I forgot to say that test were executed on my Ubuntu 17.04 laptop on > Oracle JDK 1.8.0_144-b01. > > 2017-09-15 13:21 GMT+02:00 Gonzalo Ortiz Jaureguizar <golthir...@gmail.com > >: > >> Hi there, >> >> I have created a little JMH test to check the Arrow performance. You can >> found it here. The idea is to test an API with implementations on heap >> arrays, nio buffers (that follow the arrow format) and Arrow. At this >> moment the API only supports nullable int buffers and contains read only >> methods. >> >> The benchmark run on automatically generated vectors of 2^10, 2^20 and >> 2^26 never-null integers and it tests three different access patterns: >> >> - Random access: Where a random element is read >> - Sequential access: Where a random index is chosen and then the >> following 32 elements are read >> - Sum access: Similar to sequential, but instead of simply read them, >> they are added into a long. >> >> Disclaimer: Microbenchmars are error prone and I'm not an expert on JMH >> and this benchmark has been done in a couple of hours. >> >> Results >> On all charts the Y axis is the relation between the throughput of the >> offheap versions with the heap version (so the higher the better). >> >> TD;LR: It seems that the complex structures of Arrow are preventing some >> optimizations on the JVM. >> >> Random >> The random access is quite good. The heap version is a little bit better, >> but both offheap solutions seems pretty similar. >> >> 1K 1M 64M >> Array 75.139 53.025 10.872 >> Arrow 67.399 43.491 10.42 >> Buf 82.877 38.092 10.753 >> [image: Imágenes integradas 1] >> >> Sequential >> If you see the absolute values, it is clear that JMH's blackhole is >> preventing any JVM optimization on the loop. I think thats fine, as it >> simulates several calls to the vector on a *not omptimized* scenario. >> It seems that the JVM is not smart enough to optimize offheap sequential >> as much as it does with heap structures. Although both offheap >> implementations are worse than the heap version, the one that uses Arrow is >> sensible worse than the one that directly uses ByteBuffers: >> 1K 1M 64M >> Array 6.335 4.563 3.145 >> Arrow 2.664 2.453 1.989 >> Buf 4.456 3.971 3.018 >> [image: Imágenes integradas 2] >> >> Sum >> The result is awful. It seems that the JVM is able to optimize (I guess >> vectorizing) the heap and ByteBuffer implementation (at least with small >> vectors), but not in the case with the Arrow version. I guess it is due to >> the indirections and deeper stack required to execute the same code on >> Arrow. >> >> 1K 1M 64M >> Array 44.833 26.617 9.787 >> Arrow 3.426 3.265 2.521 >> Buf 38.288 19.295 5.668 >> [image: Imágenes integradas 4] >> >> >