Hi Ying,

Le 19/02/2021 à 07:06, Ying Zhou a écrit :
> 
> Now I’m working on fixing the last concerns on my ORC writer 
> https://github.com/apache/arrow/pull/8648 
> <https://github.com/apache/arrow/pull/8648> and have two questions. 
> 
> I have a need to standardize an Arrow Array so that it is fit for cheaper 
> conversion into ORC by making sure that all the children (and grandchildren 
> etc) of null struct entries are null. Is there an established method to 
> achieve that?

Not a high-level one, but it should be relatively easy to massage the
null bitmaps yourself, using e.g.:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/bitmap_ops.h#L117

> It will also be very helpful if there Is some fast and canonical method to 
> standardize an Array and ensure that null List/LargeList/FixedSizeList/Map 
> entries have zero lengths in their value/key/item arrays.

Hmm, feel free to open a JIRA about that.

> I’m about to switch all my Write*Batch to use ArrayDataInlineVisitor (or 
> maybe ArrayDataVisitor since it is used more often?) I have a concern on 
> feasibility of using visitors for nested types. It doesn’t seem like 
> ArrayDataVisitor supports these types. Is that true? If so, shall I use 
> visitors for non-nested types while using for loops for nested ones?

The "data" visitors indeed don't support nested types, as they give you
the entry values as simple "C" values (such as int32_t or
util::string_view).  They are used to iterate over values of simple data
types (primitive, string...).

If you're visiting arbitrary types, I suggest using VisitTypeInline.

Regards

Antoine.

Reply via email to