> > AFAIK IPC is just bytes. The alignment is done when they are copied over to > allocated memory regions.
Agreed, that if implementations are copying then this isn't a concern. The IPC and File Formats were designed for memory mapping/zero copy. So there is an assumption that kernel pages meet the alignment requirements but otherwise a copy should not be strictly necessary. Just wasn't sure if it would be breaking implicit > assumptions by consumers somewhere if they happened to get an IPC stream w/ > record batches that mixed, for example, 8-byte and 64-byte alignments. I'm not aware of any assumptions here. Simply given the fact that there isn't a mandate (and due to things like slicing, sharing buffers via the C-ABI etc), I think all code handling Arrow arrays that wants to optimize for an alignment still needs to verify alignment requirements on a buffer by buffer basis. On Wed, Apr 7, 2021 at 3:23 AM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi Jacob, > > AFAIK IPC is just bytes. The alignment is done when they are copied over to > allocated memory regions. It is the implementations' responsibility to > allocate memory regions that are aligned depending on how those bytes > should be interpreted (e.g. u64 vs u8). This interpretation is induced by > the relationship between the logical types (e.g. Time32) and its > corresponding physical types (e.g. 0th buffer is u8, 1st is i32). In this > sense, afaik IPC does not need to declare byte alignment as they are > inferred by the corresponding logical type. > > Best, > Jorge > > > > > On Wed, Apr 7, 2021 at 7:40 AM Jacob Quinn <quinn.jac...@gmail.com> wrote: > > > As far as I can tell, the alignment padding used in an IPC stream/file > > isn't stored explicitly, and not really "inferrable", though maybe > > technically possible if you calculated what bytes are *necessary* given a > > buffer's data vs. what's actually stored. > > > > Just wondering if this has been brought up at all to store explicitly; it > > came up in the Julia implementation when considering "appending" record > > batches to an IPC stream that has already been written to disk; we > > originally thought we would need to match alignment used in previously > > written record batches, but upon further reflection, it seems like > > technically it wouldn't matter since all buffers have the exact byte > counts > > written anyway. Just wasn't sure if it would be breaking implicit > > assumptions by consumers somewhere if they happened to get an IPC stream > w/ > > record batches that mixed, for example, 8-byte and 64-byte alignments. > > > > -Jacob > > >