[ https://issues.apache.org/jira/browse/ARROW-264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15429008#comment-15429008 ]
Wes McKinney commented on ARROW-264: ------------------------------------ Looks like a good start to me. We should add some minor internal details, like padding all byte buffers to start and end on 8-byte boundaries (according to the Arrow spec memory will already be aligned and padded, but the serialized metadata may require padding bytes). This is a similar, but much more general version of a file layout compared with what we did in Feather (which has a schema and record batch headers in a single metadata chunk, but only a single record batch and no dictionaries -- https://github.com/wesm/feather/blob/master/doc/FORMAT.md). > Create an Arrow File format > --------------------------- > > Key: ARROW-264 > URL: https://issues.apache.org/jira/browse/ARROW-264 > Project: Apache Arrow > Issue Type: Improvement > Reporter: Julien Le Dem > Assignee: Julien Le Dem > > File layout: > (DictionaryBatch, RecordBatch, Schema as defined in Message.fbs) > {noformat} > MAGIC: ARROW1 > ( > DictionaryBatch: DictionaryBatch Header (FlatBuffer) > DictionaryBatch: DictionaryBatch Body (buffers concatenated) > )* > ( > RecordBacth: RecordBatch Header (FlatBuffer) > RecordBacth: RecordBatch Body (buffers concatenated) > )+ > Footer: Flatbuffer > Footer length: int (4 bytes unsigned LE) > MAGIC: ARROW1 > {noformat} > Footer definition: > {noformat} > table Footer { > schema: org.apache.arrow.flatbuf.Schema; > dictionaries: [ Block ]; > recordBatches: [ Block ]; > } > struct Block { > offset: long; > metaDataLength: int; > bodyLength: long; > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)