Re: [DISCUSS] Removing the "page" field from the Buffer record batch Arrow metadata

Wes McKinney Thu, 19 Oct 2017 18:03:07 -0700

The JIRA for this is https://issues.apache.org/jira/browse/ARROW-1409.
I will wait a little while for others to weigh in, but after that I
can write a patch to remove the attribute and bump the metadata format
version number.


On Thu, Oct 19, 2017 at 4:37 PM, Bryan Cutler <[email protected]> wrote:
> +1, sounds ok to me to try to solve this problem a different way in the
> future once needed.
>
> On Thu, Oct 19, 2017 at 12:30 PM, Jacques Nadeau <[email protected]> wrote:
>
>> Seems reasonable. I was among those that originally argued for this field
>> but given that we haven't used it yet, I think your proposal makes sense.
>>
>> +1
>>
>> On Wed, Oct 18, 2017 at 5:40 PM, Wes McKinney <[email protected]> wrote:
>>
>> > When we originally drafted the metadata for record batches, we
>> > included a "page id" in the Buffer struct:
>> >
>> > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L295
>> >
>> > The idea at the time was that record batches might not be colocated in
>> > a particular shared memory page. This might still happen in the
>> > future, but to this point we have not used this feature in any
>> > implemented.
>> >
>> > The cost of this extra 4 bytes is that the size of the Buffer struct
>> > with padding is 24 bytes instead of 16 bytes. In large record batches,
>> > this makes the record batch data header about 50% larger than it needs
>> > to be.
>> >
>> > I would argue that the ability to spread a record batch across
>> > multiple memory regions is a useful feature, but we should be solving
>> > that particular problem a different way, like having a separate
>> > "non-colocated buffer" type and record batch message type that has the
>> > extra page id. So when we want to use this feature, we are OK with
>> > paying the extra cost. But for most self-contained message use cases
>> > those 8 bytes in each buffer go unused.
>> >
>> > I am loathe to break the Arrow metadata at this stage, but if we agree
>> > about removing this field we should do it sooner rather than later. It
>> > may be possible to do the change in a forward compatible way if we
>> > were worried about breaking existing applications, but on the other
>> > hand I do not think we have yet made any contract about
>> > forward/backwards compatibility of metadata with our end users.
>> >
>> > Thanks,
>> > Wes
>> >
>>

Re: [DISCUSS] Removing the "page" field from the Buffer record batch Arrow metadata

Reply via email to