Thanks for raising the issue. Could you share a snippet of the code you are using on how are you reading the file? Is your decrease on performance also happening with different file-sizes or is the file-size related to your issue?
Thanks, Raúl El jue, 28 nov 2024, 13:58, Surya Kiran Gullapalli < suryakiran.gullapa...@gmail.com> escribió: > Hello all, > Trying to read a parquet file from s3 (50MB file) and it is taking much > more time than arrow 12.0.1. I've enabled threads (use_threads=true) and > batch size is set to 1024*1024. Also set the IOThreadPoolCapacity to 32. > > When I time the parquet read from s3 using boost timer shows cpu usage for > file read is 2-5%. And I think multithreaded reading was not happening. > > Reading same parquet file from local disk is fine. And reading the same > parquet file from s3 using arrow 12 is also fine. Am I missing any setting > related to reading parquet with threads or any aws setting ? > > This is the setting: > C++ > Apache arrow 16.1 > Ubuntu linux 22.04 > gcc-13.2 > > Thanks, > Surya > > >