Thanks Bryan. I merged the Java patch with the EOS change and submitted a C++ patch which also updates the specification
https://github.com/apache/arrow/pull/5361 Let me know when the JS or C# patches are ready to go and I can merge those. I updated https://issues.apache.org/jira/browse/ARROW-6545 to track the Go change corresponding to this On Wed, Sep 11, 2019 at 12:33 AM Bryan Cutler <cutl...@gmail.com> wrote: > > I have the patch for the EOS with Java writers up here > https://github.com/apache/arrow/pull/5345. Just to clarify, the EOS of > {0xFFFFFFFF, 0x00000000} is used for both stream and file formats, in > non-legacy writing mode. > > On Mon, Sep 9, 2019 at 8:01 PM Bryan Cutler <cutl...@gmail.com> wrote: > > > Sounds good to me also and I don't think we need a vote either. > > > > On Sat, Sep 7, 2019 at 7:36 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > >> +1 on this, I also don't think a vote is necessary as long as we make the > >> change before 0.15.0 > >> > >> On Saturday, September 7, 2019, Wes McKinney <wesmck...@gmail.com> wrote: > >> > >> > I see, thank you for catching this nuance. > >> > > >> > I agree that using {0xFFFFFFFF, 0x00000000} for EOS will resolve the > >> > issue while allowing implementations to be backwards compatible (i.e. > >> > handling the 4-byte EOS from older payloads). > >> > > >> > I'm not sure that we need to have a vote about this, what do others > >> think? > >> > > >> > On Sat, Sep 7, 2019 at 12:47 AM Ji Liu <niki...@aliyun.com.invalid> > >> wrote: > >> > > > >> > > Hi all, > >> > > > >> > > During the java code review[1], seems there is a problem with the > >> > current implementations(C++/Java etc) when reaching EOS, since the new > >> > format EOS is 8 bytes and the reader only reads 4 bytes when reach the > >> end > >> > of stream, and the additional 4 bytes will not be read which cause > >> problems > >> > for following up readings. > >> > > > >> > > There are some optional suggestions[2] as below, we should reach > >> > consistent and fix this problem before 0.15 release. > >> > > i. For the new format, an 8-byte EOS token should look like > >> {0xFFFFFFFF, > >> > 0x00000000}, so we read the continuation token first, and then know to > >> read > >> > the next 4 bytes, which are then 0 to signal EOS.ii. Reader just > >> remember > >> > the state, so if it reads the continuation token from the beginning, > >> then > >> > read all 8 bytes at the end. > >> > > > >> > > Thanks, > >> > > Ji Liu > >> > > > >> > > [1] https://github.com/apache/arrow/pull/5229 > >> > > [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682 > >> > > > >> > > > >> > > > >> > > > >> > > ------------------------------------------------------------------ > >> > > From:Eric Erhardt <eric.erha...@microsoft.com> > >> > > Send Time:2019年9月5日(星期四) 07:16 > >> > > To:dev@arrow.apache.org <dev@arrow.apache.org>; Ji Liu < > >> > niki...@aliyun.com> > >> > > Cc:emkornfield <emkornfi...@gmail.com>; Paul Taylor < > >> ptay...@apache.org> > >> > > Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address > >> > 8-byte Flatbuffer alignment requirements (2nd vote) > >> > > > >> > > The C# PR is up. > >> > > > >> > > https://github.com/apache/arrow/pull/5280 > >> > > > >> > > Eric > >> > > > >> > > -----Original Message----- > >> > > From: Eric Erhardt <eric.erha...@microsoft.com.INVALID> > >> > > Sent: Wednesday, September 4, 2019 10:12 AM > >> > > To: dev@arrow.apache.org; Ji Liu <niki...@aliyun.com> > >> > > Cc: emkornfield <emkornfi...@gmail.com>; Paul Taylor < > >> ptay...@apache.org > >> > > > >> > > Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address > >> > 8-byte Flatbuffer alignment requirements (2nd vote) > >> > > > >> > > I'm working on a PR for the C# bindings. I hope to have it up in the > >> > next day or two. Integration tests for C# would be a great addition at > >> some > >> > point - it's been on my backlog. For now I plan on manually testing it. > >> > > > >> > > -----Original Message----- > >> > > From: Wes McKinney <wesmck...@gmail.com> > >> > > Sent: Tuesday, September 3, 2019 10:17 PM > >> > > To: Ji Liu <niki...@aliyun.com> > >> > > Cc: emkornfield <emkornfi...@gmail.com>; dev <dev@arrow.apache.org>; > >> > Paul Taylor <ptay...@apache.org> > >> > > Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address > >> > 8-byte Flatbuffer alignment requirements (2nd vote) > >> > > > >> > > hi folks, > >> > > > >> > > We now have patches up for Java, JS, and Go. How are we doing on the > >> > code reviews for getting these in? > >> > > > >> > > Since C# implements the binary protocol, the C# developers might want > >> to > >> > look at this before the 0.15.0 release also. Absent integration tests > >> it's > >> > difficult to verify the C# library, though > >> > > > >> > > Thanks > >> > > > >> > > On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <niki...@aliyun.com> wrote: > >> > > > > >> > > > Here is the Java implementation > >> > > > > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith > >> > > > ub.com > >> %2Fapache%2Farrow%2Fpull%2F5229&data=02%7C01%7CEric.Erhardt% > >> > > > 40microsoft.com > >> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9 > >> > > > > >> 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&sdata=b87u5x8lLvfdnU5 > >> > > > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&reserved=0 > >> > > > > >> > > > cc @Wes McKinney @emkornfield > >> > > > > >> > > > Thanks, > >> > > > Ji Liu > >> > > > > >> > > > ------------------------------------------------------------------ > >> > > > From:Ji Liu <niki...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三) > >> > > > 17:34 To:emkornfield <emkornfi...@gmail.com>; dev > >> > > > <dev@arrow.apache.org> Cc:Paul Taylor <ptay...@apache.org> > >> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address > >> > > > 8-byte Flatbuffer alignment requirements (2nd vote) > >> > > > > >> > > > I could take the Java implementation and will take a close watch on > >> > this issue in the next few days. > >> > > > > >> > > > Thanks, > >> > > > Ji Liu > >> > > > > >> > > > > >> > > > ------------------------------------------------------------------ > >> > > > From:Micah Kornfield <emkornfi...@gmail.com> Send > >> Time:2019年8月28日(星期三) > >> > > > 17:14 To:dev <dev@arrow.apache.org> Cc:Paul Taylor > >> > > > <ptay...@apache.org> > >> > > > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address > >> > > > 8-byte Flatbuffer alignment requirements (2nd vote) > >> > > > > >> > > > I should have integration tests with 0.14.1 generated binaries in > >> the > >> > > > next few days. I think the one remaining unassigned piece of work > >> in > >> > > > the Java implementation, i can take that up next if no one else gets > >> > to it. > >> > > > > >> > > > On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <wesmck...@gmail.com> > >> > wrote: > >> > > > > >> > > > > Here's the C++ changes > >> > > > > > >> > > > > > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi > >> > > > > thub.com > >> %2Fapache%2Farrow%2Fpull%2F5211&data=02%7C01%7CEric.Erha > >> > > > > rdt%40microsoft.com > >> %7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f > >> > > > > > >> 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&sdata=zWaHS8X > >> > > > > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&reserved=0 > >> > > > > > >> > > > > I'm going to create a integration branch where we can merge each > >> > > > > patch before merging to master > >> > > > > > >> > > > > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <wesmck...@gmail.com > >> > > >> > wrote: > >> > > > > > > >> > > > > > It isn't implemented in C++ yet but I will try to get a patch up > >> > > > > > for that soon (today maybe). I think we should create a branch > >> > > > > > where we can stack the patches that implement this for each > >> > language. > >> > > > > > > >> > > > > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor > >> > > > > > <ptaylor.apa...@gmail.com> > >> > > > > wrote: > >> > > > > > > > >> > > > > > > I'll do the JS updates. Is it safe to validate against the > >> Arrow > >> > > > > > > C++ integration tests? > >> > > > > > > > >> > > > > > > > >> > > > > > > On 8/22/19 7:28 PM, Micah Kornfield wrote: > >> > > > > > > > I created > >> > > > > > > > > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2 > >> > > > > > > > F%2Fissues.apache.org > >> %2Fjira%2Fbrowse%2FARROW-6313&data=02 > >> > > > > > > > %7C01%7CEric.Erhardt%40microsoft.com > >> %7C90f02600c4ce40ff5c9008d > >> > > > > > > > > >> 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316 > >> > > > > > > > > >> 38512163816&sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj > >> > > > > > > > NiLxI%3D&reserved=0 as a > >> > > > > tracking > >> > > > > > > > issue with sub-issues on the development work. So far > >> no-one > >> > > > > > > > has > >> > > > > claimed > >> > > > > > > > Java and Javascript tasks. > >> > > > > > > > > >> > > > > > > > Would it make sense to have a separate dev branch for this > >> > work? > >> > > > > > > > > >> > > > > > > > Thanks, > >> > > > > > > > Micah > >> > > > > > > > > >> > > > > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney > >> > > > > > > > <wesmck...@gmail.com> > >> > > > > wrote: > >> > > > > > > > > >> > > > > > > >> The vote carries with 4 binding +1 votes and 1 non-binding > >> +1 > >> > > > > > > >> > >> > > > > > > >> I'll merge the specification patch later today and we can > >> > > > > > > >> begin working on implementations so we can get this done > >> for > >> > > > > > > >> 0.15.0 > >> > > > > > > >> > >> > > > > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler > >> > > > > > > >> <cutl...@gmail.com> > >> > > > > wrote: > >> > > > > > > >>> +1 (non-binding) > >> > > > > > > >>> > >> > > > > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou > >> > > > > > > >>> <solip...@pitrou.net> > >> > > > > > > >> wrote: > >> > > > > > > >>>> Sorry, had forgotten to send my vote on this. > >> > > > > > > >>>> > >> > > > > > > >>>> +1 from me. > >> > > > > > > >>>> > >> > > > > > > >>>> Regards > >> > > > > > > >>>> > >> > > > > > > >>>> Antoine. > >> > > > > > > >>>> > >> > > > > > > >>>> > >> > > > > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney > >> > > > > > > >>>> <wesmck...@gmail.com> wrote: > >> > > > > > > >>>>> hi all, > >> > > > > > > >>>>> > >> > > > > > > >>>>> As we've been discussing [1], there is a need to > >> introduce > >> > > > > > > >>>>> 4 > >> > > > > bytes of > >> > > > > > > >>>>> padding into the preamble of the "encapsulated IPC > >> message" > >> > > > > format to > >> > > > > > > >>>>> ensure that the Flatbuffers metadata payload begins on > >> an > >> > > > > > > >>>>> 8-byte aligned memory offset. The alternative to this > >> > > > > > > >>>>> would be for Arrow implementations where alignment is > >> > > > > > > >>>>> important (e.g. C or C++) to > >> > > > > copy > >> > > > > > > >>>>> the metadata (which is not always small) into memory > >> when > >> > > > > > > >>>>> it is unaligned. > >> > > > > > > >>>>> > >> > > > > > > >>>>> Micah has proposed to address this by adding a 4-byte > >> > > > > > > >>>>> "continuation" value at the beginning of the payload > >> > > > > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this > >> way > >> > > > > > > >>>>> is that old clients will see an invalid length (what is > >> > > > > > > >>>>> currently the first 4 bytes of the message -- a 32-bit > >> > > > > > > >>>>> little endian signed integer indicating the metadata > >> > > > > > > >>>>> length) rather than potentially crashing on a valid > >> > > > > > > >>>>> length. We also propose to expand the "end of stream" > >> > > > > > > >>>>> marker used in the stream and file format from 4 to 8 > >> > > > > > > >>>>> bytes. This has the additional effect of aligning the > >> file > >> > footer defined in File.fbs. > >> > > > > > > >>>>> > >> > > > > > > >>>>> This would be a backwards incompatible protocol change, > >> so > >> > > > > > > >>>>> older > >> > > > > > > >> Arrow > >> > > > > > > >>>>> libraries would not be able to read these new messages. > >> > > > > Maintaining > >> > > > > > > >>>>> forward compatibility (reading data produced by older > >> > > > > > > >>>>> libraries) > >> > > > > > > >> would > >> > > > > > > >>>>> be possible as we can reason that a value other than the > >> > > > > continuation > >> > > > > > > >>>>> value was produced by an older library (and then > >> validate > >> > > > > > > >>>>> the Flatbuffer message of course). Arrow implementations > >> > > > > > > >>>>> could offer > >> > > > > a > >> > > > > > > >>>>> backward compatibility mode for the sake of old readers > >> if > >> > > > > > > >>>>> they > >> > > > > > > >> desire > >> > > > > > > >>>>> (this may also assist with testing). > >> > > > > > > >>>>> > >> > > > > > > >>>>> Additionally with this vote, we want to formally approve > >> > > > > > > >>>>> the > >> > > > > change > >> > > > > > > >> to > >> > > > > > > >>>>> the Arrow "file" format to always write the (new 8-byte) > >> > > > > > > >> end-of-stream > >> > > > > > > >>>>> marker, which enables code that processes Arrow streams > >> to > >> > > > > > > >>>>> safely > >> > > > > > > >> read > >> > > > > > > >>>>> the file's internal messages as though they were a > >> normal > >> > stream. > >> > > > > > > >>>>> > >> > > > > > > >>>>> The PR making these changes to the IPC documentation is > >> > > > > > > >>>>> here > >> > > > > > > >>>>> > >> > > > > > > >>>>> > >> https://nam06.safelinks.protection.outlook.com/?url=https% > >> > > > > > > >>>>> > >> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&data > >> > > > > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com > >> %7C90f02600c4ce40ff > >> > > > > > > >>>>> > >> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C > >> > > > > > > >>>>> > >> 0%7C637031638512163816&sdata=WF9uQ1d7GzHohv31%2BW3tl3I > >> > > > > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&reserved=0 > >> > > > > > > >>>>> > >> > > > > > > >>>>> Please vote to accept these changes. This vote will be > >> > > > > > > >>>>> open for > >> > > > > at > >> > > > > > > >>>>> least 72 hours > >> > > > > > > >>>>> > >> > > > > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1 > >> I > >> > > > > > > >>>>> disagree because... > >> > > > > > > >>>>> > >> > > > > > > >>>>> Here is my vote: +1 > >> > > > > > > >>>>> > >> > > > > > > >>>>> Thanks, > >> > > > > > > >>>>> Wes > >> > > > > > > >>>>> > >> > > > > > > >>>>> [1]: > >> > > > > > > >> > >> > > > > > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli > >> > > > > sts.apache.org > >> %2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd > >> > > > > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org > >> %253E&data=02%7C0 > >> > > > > 1%7CEric.Erhardt%40microsoft.com > >> %7C90f02600c4ce40ff5c9008d730e66b68% > >> > > > > > >> 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773& > >> > > > > > >> sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&reserved=0 > >> > > > > > > >>>> > >> > > > > > > >>>> > >> > > > > > > >>>> > >> > > > > > >> > > > >> > > >> > >