Some configs, like use_thread would be true in Python but false in C++

Maybe we call fill all configs explicitly with same values

Best,
Xuwei Fu

J N <jaynarale3...@gmail.com> 于2024年6月13日周四 13:32写道:

> Hello,
>     We all know that there inherent overhead in Python, and we wanted to
> compare the performance of reading data using C++ Arrow against PyArrow for
> high throughput systems. Since I couldn't find any benchmarks online for
> this comparison, I decided to create my own. These programs read a Parquet
> file into arrow::Table in both C++ and Python, and are single threaded.
>
> Carrow benchmark -
> https://gist.github.com/jaystarshot/9608bf4b9fdd399c1658d71328ce2c6d
> Pyarrow benchmark -
> https://gist.github.com/jaystarshot/451f97b75e9750b1f00d157e6b9b3530
>
> Ps: I am new to arrow so some things might be inefficient in both
>
> They read a zstd compressed parquet file of around 300MB.
> The results were very different than what we expected.
> *Pyarrow*
> Total time: 5.347517251968384 seconds
>
> *C++ Arrow*
> Total time: 5.86806 seconds
>
> For smaller files however (0.5MB), c++ arrow was better
>
> *Pyarrow*
> gzip
> Total time: 0.013672113418579102 seconds
>
> *C++ Arrow*
> Total time: 0.00501744 seconds
> (carrow 10x better)
>
> So I have a question to the arrow experts, is this expected in the arrow
> world or is there some error in my benchmark?
>
> Thank you!
>
>
> --
> Warm Regards,
>
> Jay Narale
>

Reply via email to