Re: [Python] Best practices when exposing options

Neal Richardson Fri, 12 Mar 2021 07:56:04 -0800

Hi Ying,
I'd suggest looking at how the other file readers and writers (CSV,
Parquet, etc.) expose their options. I don't know pyarrow well enough
myself to tell you what the answer is, but the answer is probably following
whatever model is already there for those options.


Neal

On Fri, Mar 12, 2021 at 12:43 AM Ying Zhou <yzhou7...@gmail.com> wrote:

> Hi,
>
> Currently I’m working on ARROW-11297
> https://github.com/mathyingzhou/arrow/tree/ARROW-11297 <
> https://github.com/mathyingzhou/arrow/tree/ARROW-11297>) which will be
> filed as soon as the current PR is merged.
>
> I managed to reimplement orc::WriterOptions in Arrow (with naming
> conventions Arrow-ized) as arrow::adapters::orc::WriterOptions (which is
> necessary since we do not allow third party headers to be included in our
> public headers) and finished the C++ part of the work. Now I’m trying to
> expose WriterOptions in Python. I do wonder how this is supposed to be done
> in general. After reading the code in array.pxi I think maybe this is the
> way I want to do it:
>
> 1. The end user will see individual ORC writer options (e.g.
> CompressionKind, that is, whether we use ZLIB, LZ0 or some other form of
> compression or none at all) as keyword arguments.
> 2. These keyword arguments will be processed in _orc.pyx first as a
> dictionary and then using an adapter they will be converted into an
> arrow::adapters::orc::WriterOptions.
>
> Is this the right way?
>
> Moreover I do wonder how we should convert the enums. Shall I use a series
> of if/elif or a mapping dict to force people to use one of the correct
> strings or get a ValueError?
>
> e.g.
>
> compression_kind_mapping =
> {’snappy’:CompressionKind._CompressionKind_SNAPPY,
>
> ’zl0’:CompressionKind._CompressionKind_ZL0}} #There are other options, this
> is just an example
> If compression_kind not in compression_kind_mapping.keys():
>         raise ValueError(“Unknown compression_kind”)
> c_compression_kind = compression_kind_mapping[compression_kind]
>
> Ying

Re: [Python] Best practices when exposing options

Reply via email to