Hi Ying,

I have a need to standardize an Arrow Array so that it is fit for cheaper
> conversion into ORC by making sure that all the children (and grandchildren
> etc) of null struct entries are null. Is there an established method to
> achieve that?

I'm not aware of one.  Maybe there is something in compute kernels that I'm
not aware of.

 It will also be very helpful if there Is some fast and canonical method to
> standardize an Array and ensure that null List/LargeList/FixedSizeList/Map
> entries have zero lengths in their value/key/item arrays.

Agreed.  I don't think one existits but I think this type of edge case is
rare enough that you can detect it and throw an error for the time being
(this is what is done for parquet).

 If so, shall I use visitors for non-nested types while using for loops for
> nested ones?

This sounds reasonable to me.

-Micah




On Thu, Feb 18, 2021 at 10:06 PM Ying Zhou <yzhou7...@gmail.com> wrote:

> Hi,
>
> Now I’m working on fixing the last concerns on my ORC writer
> https://github.com/apache/arrow/pull/8648 <
> https://github.com/apache/arrow/pull/8648> and have two questions.
>
> I have a need to standardize an Arrow Array so that it is fit for cheaper
> conversion into ORC by making sure that all the children (and grandchildren
> etc) of null struct entries are null. Is there an established method to
> achieve that? It will also be very helpful if there Is some fast and
> canonical method to standardize an Array and ensure that null
> List/LargeList/FixedSizeList/Map entries have zero lengths in their
> value/key/item arrays.
>
> I’m about to switch all my Write*Batch to use ArrayDataInlineVisitor (or
> maybe ArrayDataVisitor since it is used more often?) I have a concern on
> feasibility of using visitors for nested types. It doesn’t seem like
> ArrayDataVisitor supports these types. Is that true? If so, shall I use
> visitors for non-nested types while using for loops for nested ones?
>
> Thanks,
> Ying

Reply via email to