Rafferty97 opened a new issue, #20473:
URL: https://github.com/apache/datafusion/issues/20473

   ### Is your feature request related to a problem or challenge?
   
   Currently, Datafusion doesn't appear to support reading CSV files that use a 
non-UTF-8 encoding scheme, such as the common ISO-8859-1 or others.
   
   While CSV may be a terrible data format, it's also ubiquitous in the wild 
and many of them use alternative character encodings. It would be useful if 
there was an option to read CSV files that use an encoding other than UTF-8.
   
   ### Describe the solution you'd like
   
   Add an option to `CsvOptions` or elsewhere to specify the encoding used by 
the input file, defaulting to `UTF-8`. Datafusion could then use `encoding_rs` 
internally to decode chunks of incoming data.
   
   ### Describe alternatives you've considered
   
   An alternative to depending on `encoding_rs` directly would be to expose an 
option that allowed users to provide their own decoding logic, which they would 
then likely delegate to `encoding_rs`. This might be desirable if the added 
dependency is deemed to heavy (though it could easily be put behind a feature 
flag).
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to