Hello all,
Trying to read a parquet file from s3 (50MB file) and it is taking much
more time than arrow 12.0.1. I've enabled threads (use_threads=true) and
batch size is set to 1024*1024. Also set the IOThreadPoolCapacity to 32.
When I time the parquet read from s3 using boost timer shows cpu usag
Thanks for raising the issue.
Could you share a snippet of the code you are using on how are you reading
the file?
Is your decrease on performance also happening with different file-sizes or
is the file-size related to your issue?
Thanks,
Raúl
El jue, 28 nov 2024, 13:58, Surya Kiran Gullapalli
Thanks for the quick response.
When the file sizes are small (less than 10MB), I'm not seeing much
difference (not noticeable). But beyond that I'm seeing difference. I'll
send a snippet in due course.
Surya
On Thu, Nov 28, 2024 at 6:37 PM Raúl Cumplido
wrote:
> Thanks for raising the issue.
>
Severity: critical
Affected versions:
- Apache Arrow R package 4.0.0 through 16.1.0
Description:
Deserialization of untrusted data in IPC and Parquet readers in the Apache
Arrow R package versions 4.0.0 through 16.1.0 allows arbitrary code execution.
An application is vulnerable if it
reads