When we originally drafted the metadata for record batches, we included a "page id" in the Buffer struct:
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L295 The idea at the time was that record batches might not be colocated in a particular shared memory page. This might still happen in the future, but to this point we have not used this feature in any implemented. The cost of this extra 4 bytes is that the size of the Buffer struct with padding is 24 bytes instead of 16 bytes. In large record batches, this makes the record batch data header about 50% larger than it needs to be. I would argue that the ability to spread a record batch across multiple memory regions is a useful feature, but we should be solving that particular problem a different way, like having a separate "non-colocated buffer" type and record batch message type that has the extra page id. So when we want to use this feature, we are OK with paying the extra cost. But for most self-contained message use cases those 8 bytes in each buffer go unused. I am loathe to break the Arrow metadata at this stage, but if we agree about removing this field we should do it sooner rather than later. It may be possible to do the change in a forward compatible way if we were worried about breaking existing applications, but on the other hand I do not think we have yet made any contract about forward/backwards compatibility of metadata with our end users. Thanks, Wes