Hi,

For the C++ tests for the ORC writer there are two functions I need which can 
significantly shorten the tests, namely a generic table generator and a table 
converter. 

For the former I know there is arrow/testing/random.h which can generate random 
arrays. Shall I generate random struct arrays using ArrayOf and then expand 
them into RecordBatches or alternatively shall I generate each array separately 
using ArrayOf and then combine them? By the way I haven’t found any function 
that can directly generate an Arrow Table using a schema, size and 
null_probability. Is there any need for such functionality? If this is useful 
for purposes beyond ORC/Parquet/CSV/etc IO maybe we should write one.

For the latter what I need is a table converter that can recursively convert 
every instance of LargeBinary and FixedSizeBinary into Binary, every instance 
of LargeString into String, every instance of Date64 into Timestamp (unit = 
MILLI), every instance of LargeList and FixedSizeList into List and maybe every 
instance of Map into List of Structs in a table to independently produce the 
expected ORCReader(ORCWriter(Table)) so that I can verify that the ORCWriter is 
working as intended. For this problem I have at least two possible approaches: 
either perform the conversion mainly at array level or do so mainly at scalar 
level. Which one is better?

Thanks,
Ying

P.S. Thanks Antoine and Uwe for the very helpful reviews! The current codebase 
is already very different from the one when it was last reviewed. :)
P.S.S. The table converter is unavoidable due to Arrow having a lot more types 
than ORC.

Reply via email to