Re: Arrow File with Multiple Record Batches

Julien Le Dem Thu, 08 Sep 2016 14:10:29 -0700

Hi Brian,
It's not one record batch per field. Each field describes a column in the
schema.
Record batches are partitions of the dataset. As such all record batches
have the same schema which is defined in the footer.
There can be any number of record batches for a given schema.


Then in each record batch:
 - there are as many FieldNodes as there are Fields total in the schema
tree.
 - For each field the buffer count is defined by the layout attribute in
Field.

IHTH, Julien



On Thu, Sep 8, 2016 at 9:15 AM, Brian Hulette <[email protected]> wrote:

> Hi all,
>
> I'm very interested in the Arrow file format - I would eventually like
> to use it to export data in a columnar format that can be read directly
> in a browser through a Javascript library. I've been reviewing the
> specification and Julien's Java implementation, and I'm a little bit
> confused about the relationship between the Schema in the footer and the
> record batch(es)
>
> If a schema is referring to multiple record batches, is it assumed that
> the first fields in the schema refer to the first record batch, until
> all of its Buffers and FieldNodes are accounted for, then the next set
> of fields refer to the next record batch, and so on?
>
> If so, it doesn't seem like the current implementation supports this
> behavior. Which is fine, I just want to make sure I understand.
>
> Thanks,
>
> Brian Hulette
>
>


-- 
Julien

Re: Arrow File with Multiple Record Batches

Reply via email to