[ https://issues.apache.org/jira/browse/ARROW-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569009#comment-17569009 ]
Joost Hoozemans commented on ARROW-16000: ----------------------------------------- Thanks everyone for the advice. What makes CsvFragmentScanOptions the preferred place over csv.ReadOptions? CsvFragmentScanOptions right now doesn't directly store any properties itself, it only carries a csv.ConvertOptions and csv.ReadOptions. And compression and encoding sound like properties of a whole file, not a fragment (although I don't know if that is what Fragment means here). Would it make sense as first attempt for me to add a TransformInputStream to CsvFragmentScanOptions or ReadOptions? Because then I can create 1 in python in the same way read_csv does it (with MakeTransformInputStream, with a callback into en/decode functions in a python library). Then we can see if there is a performance problem. Then later we could add functionality that creates a TransformInputStream in c++ world with a callback to some external library as Antoine suggested > [C++][Dataset] Support Latin-1 encoding > --------------------------------------- > > Key: ARROW-16000 > URL: https://issues.apache.org/jira/browse/ARROW-16000 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Nicola Crane > Assignee: Joost Hoozemans > Priority: Major > > In ARROW-15992 a user is reporting issues with trying to read in files with > Latin-1 encoding. I had a look through the docs for the Dataset API and I > don't think this is currently supported. -- This message was sent by Atlassian Jira (v8.20.10#820010)