Some configs, like use_thread would be true in Python but false in C++ Maybe we call fill all configs explicitly with same values
Best, Xuwei Fu J N <jaynarale3...@gmail.com> 于2024年6月13日周四 13:32写道: > Hello, > We all know that there inherent overhead in Python, and we wanted to > compare the performance of reading data using C++ Arrow against PyArrow for > high throughput systems. Since I couldn't find any benchmarks online for > this comparison, I decided to create my own. These programs read a Parquet > file into arrow::Table in both C++ and Python, and are single threaded. > > Carrow benchmark - > https://gist.github.com/jaystarshot/9608bf4b9fdd399c1658d71328ce2c6d > Pyarrow benchmark - > https://gist.github.com/jaystarshot/451f97b75e9750b1f00d157e6b9b3530 > > Ps: I am new to arrow so some things might be inefficient in both > > They read a zstd compressed parquet file of around 300MB. > The results were very different than what we expected. > *Pyarrow* > Total time: 5.347517251968384 seconds > > *C++ Arrow* > Total time: 5.86806 seconds > > For smaller files however (0.5MB), c++ arrow was better > > *Pyarrow* > gzip > Total time: 0.013672113418579102 seconds > > *C++ Arrow* > Total time: 0.00501744 seconds > (carrow 10x better) > > So I have a question to the arrow experts, is this expected in the arrow > world or is there some error in my benchmark? > > Thank you! > > > -- > Warm Regards, > > Jay Narale >