matko opened a new issue, #1005:
URL: https://github.com/apache/datafusion-python/issues/1005

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Right now datafusion-python only exposes a handful of write options that are 
actually supported by datafusion. Notably, `DatafusionWriteOptions` is always 
just initialized with its defaults, which means the following features are not 
supported for any format:
   - partitioned (hive style) writes
   - sorted writes
   - insert option (though I suspect this is actually not supported anyway for 
any of the 3 exposed formats)
   - single file output (though in my experiments I've not been able to make 
this actually do anything in rust)
   
   Furthermore, there's options for each format, of which only some are now 
exposed:
   - parquet: only global compression options are exposed, but the writer 
actually supports pretty fine grained column options that are now unusable from 
python.
   - csv: only header inclusion/exclusion is supported. The underlying writer 
supports a lot of options for setting up things like delimiters, quote style, 
what to do with nulls, etc.
   - json: No options are supported at all right now. The underlying writer 
supports compression.
   
   **Describe the solution you'd like**
   Expose all supported write options in datafusion-python. I think we should 
just support sending in dictionaries with these options in the names people 
would expect from the rust documentation. More important options could 
additionally be top-level keyword arguments in their own right, much like is 
already the case for parquet global compression.
   
   **Describe alternatives you've considered**
   One alternative is bypassing datafusion and using parquet directly from an 
arrow stream. This means not being able to work with object stores though, and 
even when object stores are not needed it's not very ergonomic.
   
   **Additional context**
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to