Re: [C++] Parquet file read from s3

Raúl Cumplido Thu, 28 Nov 2024 05:07:31 -0800

Thanks for raising the issue.

Could you share a snippet of the code you are using on how are you reading
the file?
Is your decrease on performance also happening with different file-sizes or
is the file-size related to your issue?


Thanks,

Raúl

El jue, 28 nov 2024, 13:58, Surya Kiran Gullapalli <
suryakiran.gullapa...@gmail.com> escribió:

> Hello all,
> Trying to read a parquet file from s3 (50MB file) and it is taking much
> more time than arrow 12.0.1. I've enabled threads (use_threads=true) and
> batch size is set to 1024*1024. Also set the IOThreadPoolCapacity to 32.
>
> When I time the parquet read from s3 using boost timer shows cpu usage for
> file read is 2-5%. And I think multithreaded reading was not happening.
>
> Reading same parquet file from local disk is fine. And reading the same
> parquet file from s3 using arrow 12 is also fine. Am I missing any setting
> related to reading parquet with threads or any aws setting ?
>
> This is the setting:
> C++
> Apache arrow 16.1
> Ubuntu linux 22.04
> gcc-13.2
>
> Thanks,
> Surya
>
>
>

Re: [C++] Parquet file read from s3

Reply via email to