+1 for the change I'm all for making the metadata small and to solve it in a different if the field is really needed. Users who do not need the feature shouldn't have to pay for it.
On Thu, Oct 19, 2017 at 6:02 PM, Wes McKinney <wesmck...@gmail.com> wrote: > The JIRA for this is https://issues.apache.org/jira/browse/ARROW-1409. > I will wait a little while for others to weigh in, but after that I > can write a patch to remove the attribute and bump the metadata format > version number. > > On Thu, Oct 19, 2017 at 4:37 PM, Bryan Cutler <cutl...@gmail.com> wrote: > > +1, sounds ok to me to try to solve this problem a different way in the > > future once needed. > > > > On Thu, Oct 19, 2017 at 12:30 PM, Jacques Nadeau <jacq...@apache.org> > wrote: > > > >> Seems reasonable. I was among those that originally argued for this > field > >> but given that we haven't used it yet, I think your proposal makes > sense. > >> > >> +1 > >> > >> On Wed, Oct 18, 2017 at 5:40 PM, Wes McKinney <wesmck...@gmail.com> > wrote: > >> > >> > When we originally drafted the metadata for record batches, we > >> > included a "page id" in the Buffer struct: > >> > > >> > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L295 > >> > > >> > The idea at the time was that record batches might not be colocated in > >> > a particular shared memory page. This might still happen in the > >> > future, but to this point we have not used this feature in any > >> > implemented. > >> > > >> > The cost of this extra 4 bytes is that the size of the Buffer struct > >> > with padding is 24 bytes instead of 16 bytes. In large record batches, > >> > this makes the record batch data header about 50% larger than it needs > >> > to be. > >> > > >> > I would argue that the ability to spread a record batch across > >> > multiple memory regions is a useful feature, but we should be solving > >> > that particular problem a different way, like having a separate > >> > "non-colocated buffer" type and record batch message type that has the > >> > extra page id. So when we want to use this feature, we are OK with > >> > paying the extra cost. But for most self-contained message use cases > >> > those 8 bytes in each buffer go unused. > >> > > >> > I am loathe to break the Arrow metadata at this stage, but if we agree > >> > about removing this field we should do it sooner rather than later. It > >> > may be possible to do the change in a forward compatible way if we > >> > were worried about breaking existing applications, but on the other > >> > hand I do not think we have yet made any contract about > >> > forward/backwards compatibility of metadata with our end users. > >> > > >> > Thanks, > >> > Wes > >> > > >> >