m09526 opened a new issue, #15656:
URL: https://github.com/apache/datafusion/issues/15656

   ### Is your feature request related to a problem or challenge?
   
   When running a DataFusion query that produces a single >100GiB output 
Parquet file written to S3, the query failed due to an error returned from S3 
via ObjectStore.
   
   Investigation showed this stems from how DataFusion uses ObjectStore's 
[BufWriter](https://docs.rs/object_store/latest/object_store/buffered/struct.BufWriter.html)
 to upload files to remote stores. BufWriter uses a [default buffer 
size](https://docs.rs/object_store/latest/src/object_store/buffered.rs.html#252)
 of 10MiB. For the ObjectStore's AWS S3 implementation, it uses the "multipart 
upload" API which allows for uploading of object data in chunks and is 
recommended for objects >100MiB and [supports up to 10,000 
parts](https://docs.aws.amazon.com/AmazonS3/latest/userguide/mpuoverview.html#:~:text=When%20uploading%20a%20part%2C%20you%20must%20specify%20a%20part%20number%20in%20addition%20to%20the%20upload%20ID.%20You%20can%20choose%20any%20part%20number%20between%201%20and%2010%2C000).
   
   With the default buffer size: 10,000 x 10MiB = 100GiB maximum object size.
   
   BufWriter supports changing the buffer size using the 
[with_capacity](https://docs.rs/object_store/latest/object_store/buffered/struct.BufWriter.html#method.with_capacity)
 function.
   
   ### Describe the solution you'd like
   
   Add an configuration option to DataFusion, probably an execution option. 
This would be an `Option<usize>` to allow specifying a larger upload buffer 
size to enable writing larger result files to S3. This option would then be 
available via `TaskContext`. BufWriter is used in
   * datafusion/datasource/src/write/mod.rs -> in `create_writer`.
   * datafusion/datasource-csv/src/source.rs
   * datafusion/datasource-json/src/source.rs
   * datafusion/datasource-parquet/src/writer.rs
   * datafusion/datasource-parquet/src/file_format.rs
   
   The relevant `TaskContext` is directly available in most of these files 
where the BufWriter is created. The `create_writer` function is also only 
called from places where the `TaskContext` object is in scope. As this function 
is public, we could create a new `create_writer_with_size` function which 
accepts a buffer size parameter.
   
   ### Describe alternatives you've considered
   
   Limit the size of the query to never exceed 100GiB of Parquet output. Split 
queries into smaller workloads. This is not a viable option for our usecase.
   
   Create a ObjectStore wrapper object that implements extra buffering 
behaviour and register this with DataFusion before running a query.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to