Re: Arrow File with Multiple Record Batches

2016-09-08 Thread Brian Hulette
Ah got it, thanks Julien. I was thinking that each RecordBatch could have different schemas, which in retrospect doesn't seem very logical. In essence I guess I was thinking each record batch was a partition of the schema's fields, instead of a partition of the entire dataset. Thanks for clea

Re: Arrow File with Multiple Record Batches

2016-09-08 Thread Julien Le Dem
Hi Brian, It's not one record batch per field. Each field describes a column in the schema. Record batches are partitions of the dataset. As such all record batches have the same schema which is defined in the footer. There can be any number of record batches for a given schema. Then in each recor

Arrow File with Multiple Record Batches

2016-09-08 Thread Brian Hulette
Hi all, I'm very interested in the Arrow file format - I would eventually like to use it to export data in a columnar format that can be read directly in a browser through a Javascript library. I've been reviewing the specification and Julien's Java implementation, and I'm a little bit confused a