I recommend sending a PR to the benchmark repo that clarifies that
it's executing the query using the arrow R/C++ library, when in fact
the query is actually primarily handled by dplyr and not Arrow at all.
The benchmark is very misleading in its current form.

On Fri, Jun 25, 2021 at 11:55 AM Jorge Cardoso Leitão
<jorgecarlei...@gmail.com> wrote:
>
> Hi,
>
> HO2 has a set of benchmarks comparing different query engines [1].
>
> There is currently an implementation named "Arrow", backed by the Arrow R
> implementation [2].
>
> This is one of the least performant implementations evaluated. I sense that
> this may negatively affect the Arrow format, as people will (even if
> unfairly) associate "Arrow" to "poor performance". In fact, polars and
> cuDF, the top performers, also use Arrow as their backing in-memory format.
>
> Would it make sense to avoid naming specific query engines as "Arrow" (e.g.
> like we do with DataFusion, Grandiva, etc), so that these misunderstandings
> are avoided?
>
> Best,
> Jorge
>
> [1] https://h2oai.github.io/db-benchmark/
> [2] https://github.com/h2oai/db-benchmark/tree/master/arrow

Reply via email to