Hi there,

I have created a little JMH test to check the Arrow performance. You can
found it here. The idea is to test an API with implementations on heap
arrays, nio buffers (that follow the arrow format) and Arrow. At this
moment the API only supports nullable int buffers and contains read only
methods.

The benchmark run on automatically generated vectors of 2^10, 2^20 and 2^26
never-null integers and it tests three different access patterns:

   - Random access: Where a random element is read
   - Sequential access: Where a random index is chosen and then the
   following 32 elements are read
   - Sum access: Similar to sequential, but instead of simply read them,
   they are added into a long.

Disclaimer: Microbenchmars are error prone and I'm not an expert on JMH and
this benchmark has been done in a couple of hours.

Results
On all charts the Y axis is the relation between the throughput of the
offheap versions with the heap version (so the higher the better).

TD;LR: It seems that the complex structures of Arrow are preventing some
optimizations on the JVM.

Random
The random access is quite good. The heap version is a little bit better,
but both offheap solutions seems pretty similar.

        1K      1M      64M
Array   75.139  53.025  10.872
Arrow   67.399  43.491  10.42
Buf     82.877  38.092  10.753
[image: Imágenes integradas 1]

Sequential
If you see the absolute values, it is clear that JMH's blackhole is
preventing any JVM optimization on the loop. I think thats fine, as it
simulates several calls to the vector on a *not omptimized* scenario.
It seems that the JVM is not smart enough to optimize offheap sequential as
much as it does with heap structures. Although both offheap implementations
are worse than the heap version, the one that uses Arrow is sensible worse
than the one that directly uses ByteBuffers:
        1K      1M      64M
Array   6.335   4.563   3.145
Arrow   2.664   2.453   1.989
Buf     4.456   3.971   3.018
[image: Imágenes integradas 2]

Sum
The result is awful. It seems that the JVM is able to optimize (I guess
vectorizing) the heap and ByteBuffer implementation (at least with small
vectors), but not in the case with the Arrow version. I guess it is due to
the indirections and deeper stack required to execute the same code on
Arrow.

        1K      1M      64M
Array   44.833  26.617  9.787
Arrow   3.426   3.265   2.521
Buf     38.288  19.295  5.668
[image: Imágenes integradas 4]

Reply via email to