Hi Ying, I'd suggest looking at how the other file readers and writers (CSV, Parquet, etc.) expose their options. I don't know pyarrow well enough myself to tell you what the answer is, but the answer is probably following whatever model is already there for those options.
Neal On Fri, Mar 12, 2021 at 12:43 AM Ying Zhou <yzhou7...@gmail.com> wrote: > Hi, > > Currently I’m working on ARROW-11297 > https://github.com/mathyingzhou/arrow/tree/ARROW-11297 < > https://github.com/mathyingzhou/arrow/tree/ARROW-11297>) which will be > filed as soon as the current PR is merged. > > I managed to reimplement orc::WriterOptions in Arrow (with naming > conventions Arrow-ized) as arrow::adapters::orc::WriterOptions (which is > necessary since we do not allow third party headers to be included in our > public headers) and finished the C++ part of the work. Now I’m trying to > expose WriterOptions in Python. I do wonder how this is supposed to be done > in general. After reading the code in array.pxi I think maybe this is the > way I want to do it: > > 1. The end user will see individual ORC writer options (e.g. > CompressionKind, that is, whether we use ZLIB, LZ0 or some other form of > compression or none at all) as keyword arguments. > 2. These keyword arguments will be processed in _orc.pyx first as a > dictionary and then using an adapter they will be converted into an > arrow::adapters::orc::WriterOptions. > > Is this the right way? > > Moreover I do wonder how we should convert the enums. Shall I use a series > of if/elif or a mapping dict to force people to use one of the correct > strings or get a ValueError? > > e.g. > > compression_kind_mapping = > {’snappy’:CompressionKind._CompressionKind_SNAPPY, > > ’zl0’:CompressionKind._CompressionKind_ZL0}} #There are other options, this > is just an example > If compression_kind not in compression_kind_mapping.keys(): > raise ValueError(“Unknown compression_kind”) > c_compression_kind = compression_kind_mapping[compression_kind] > > Ying