connec opened a new issue, #11472: URL: https://github.com/apache/datafusion/issues/11472
### Is your feature request related to a problem or challenge? I'm trying to read CSVs that include newlines in (quoted) values. ### Describe the solution you'd like Some googling revealed that this isn't supported currently by the `arrow-csv` crate, whereas that functionality does exist in the C++ ([`ParseOptions::newlines_in_values`](https://arrow.apache.org/docs/cpp/api/formats.html#_CPPv4N5arrow3csv12ParseOptions18newlines_in_valuesE)) and Python ([`ParseOptions.newlines_in_values`](https://arrow.apache.org/docs/python/generated/pyarrow.csv.ParseOptions.html#pyarrow.csv.ParseOptions.newlines_in_values)) implementations. Ideally, a `newlines_in_values` field could be added to [`datafusion::common::config::CsvOptions`](https://docs.rs/datafusion/latest/datafusion/common/config/struct.CsvOptions.html) to support this functionality. Note that the Python docs call out the performance implications of this: > Setting this to True reduces the performance of multi-threaded CSV reading. I haven't dug into the implementation, but I imagine it becomes harder to find the right split point for multi-threaded reading (though, it seems not dissimilar to finding the prev/next linebreak, so perhaps not insurmountable...). ### Describe alternatives you've considered The only alternative I can see would be to preprocess the CSV before feeding it into DF. I haven't explored this option as I imagine it would take a lot of DF plumbing, and it seems valuable to have parity with other arrow CSV packages (C++ and Python, at least). ### Additional context I was originally planning to report this against the `arrow-rs` repository, but since my use-case is with `datafusion` I decided to report it here. Let me know if this issue would be more appropriate there and I can move/copy it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
