Re: Arrow File with Multiple Record Batches

2016-09-08 Thread Brian Hulette
Ah got it, thanks Julien. I was thinking that each RecordBatch could have different schemas, which in retrospect doesn't seem very logical. In essence I guess I was thinking each record batch was a partition of the schema's fields, instead of a partition of the entire dataset. Thanks for clea

Re: Arrow File with Multiple Record Batches

2016-09-08 Thread Julien Le Dem
Hi Brian, It's not one record batch per field. Each field describes a column in the schema. Record batches are partitions of the dataset. As such all record batches have the same schema which is defined in the footer. There can be any number of record batches for a given schema. Then in each recor