I see, thank you for catching this nuance.

I agree that using {0xFFFFFFFF, 0x00000000} for EOS will resolve the
issue while allowing implementations to be backwards compatible (i.e.
handling the 4-byte EOS from older payloads).

I'm not sure that we need to have a vote about this, what do others think?

On Sat, Sep 7, 2019 at 12:47 AM Ji Liu <niki...@aliyun.com.invalid> wrote:
>
> Hi all,
>
> During the java code review[1], seems there is a problem with the current 
> implementations(C++/Java etc) when reaching EOS, since the new format EOS is 
> 8 bytes and the reader only reads 4 bytes when reach the end of stream, and 
> the additional 4 bytes will not be read which cause problems for following up 
> readings.
>
> There are some optional suggestions[2] as below, we should reach consistent 
> and fix this problem before 0.15 release.
> i. For the new format, an 8-byte EOS token should look like {0xFFFFFFFF, 
> 0x00000000}, so we read the continuation token first, and then know to read 
> the next 4 bytes, which are then 0 to signal EOS.ii. Reader just remember the 
> state, so if it reads the continuation token from the beginning, then read 
> all 8 bytes at the end.
>
> Thanks,
> Ji Liu
>
> [1] https://github.com/apache/arrow/pull/5229
> [2] https://github.com/apache/arrow/pull/5229#discussion_r321715682
>
>
>
>
> ------------------------------------------------------------------
> From:Eric Erhardt <eric.erha...@microsoft.com>
> Send Time:2019年9月5日(星期四) 07:16
> To:dev@arrow.apache.org <dev@arrow.apache.org>; Ji Liu <niki...@aliyun.com>
> Cc:emkornfield <emkornfi...@gmail.com>; Paul Taylor <ptay...@apache.org>
> Subject:RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte 
> Flatbuffer alignment requirements (2nd vote)
>
> The C# PR is up.
>
> https://github.com/apache/arrow/pull/5280
>
> Eric
>
> -----Original Message-----
> From: Eric Erhardt <eric.erha...@microsoft.com.INVALID>
> Sent: Wednesday, September 4, 2019 10:12 AM
> To: dev@arrow.apache.org; Ji Liu <niki...@aliyun.com>
> Cc: emkornfield <emkornfi...@gmail.com>; Paul Taylor <ptay...@apache.org>
> Subject: RE: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte 
> Flatbuffer alignment requirements (2nd vote)
>
> I'm working on a PR for the C# bindings. I hope to have it up in the next day 
> or two. Integration tests for C# would be a great addition at some point - 
> it's been on my backlog. For now I plan on manually testing it.
>
> -----Original Message-----
> From: Wes McKinney <wesmck...@gmail.com>
> Sent: Tuesday, September 3, 2019 10:17 PM
> To: Ji Liu <niki...@aliyun.com>
> Cc: emkornfield <emkornfi...@gmail.com>; dev <dev@arrow.apache.org>; Paul 
> Taylor <ptay...@apache.org>
> Subject: Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte 
> Flatbuffer alignment requirements (2nd vote)
>
> hi folks,
>
> We now have patches up for Java, JS, and Go. How are we doing on the code 
> reviews for getting these in?
>
> Since C# implements the binary protocol, the C# developers might want to look 
> at this before the 0.15.0 release also. Absent integration tests it's 
> difficult to verify the C# library, though
>
> Thanks
>
> On Thu, Aug 29, 2019 at 8:13 AM Ji Liu <niki...@aliyun.com> wrote:
> >
> > Here is the Java implementation
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> > ub.com%2Fapache%2Farrow%2Fpull%2F5229&amp;data=02%7C01%7CEric.Erhardt%
> > 40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f141af9
> > 1ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=b87u5x8lLvfdnU5
> > 6LrGzYR8H0Jh8FfwY2cVjbOsY9hY%3D&amp;reserved=0
> >
> > cc @Wes McKinney @emkornfield
> >
> > Thanks,
> > Ji Liu
> >
> > ------------------------------------------------------------------
> > From:Ji Liu <niki...@aliyun.com.INVALID> Send Time:2019年8月28日(星期三)
> > 17:34 To:emkornfield <emkornfi...@gmail.com>; dev
> > <dev@arrow.apache.org> Cc:Paul Taylor <ptay...@apache.org>
> > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >
> > I could take the Java implementation and will take a close watch on this 
> > issue in the next few days.
> >
> > Thanks,
> > Ji Liu
> >
> >
> > ------------------------------------------------------------------
> > From:Micah Kornfield <emkornfi...@gmail.com> Send Time:2019年8月28日(星期三)
> > 17:14 To:dev <dev@arrow.apache.org> Cc:Paul Taylor
> > <ptay...@apache.org>
> > Subject:Re: [RESULT] [VOTE] Alter Arrow binary protocol to address
> > 8-byte Flatbuffer alignment requirements (2nd vote)
> >
> > I should have integration tests with 0.14.1 generated binaries in the
> > next few days.  I think the one remaining unassigned piece of work in
> > the Java implementation, i can take that up next if no one else gets to it.
> >
> > On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > > Here's the C++ changes
> > >
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgi
> > > thub.com%2Fapache%2Farrow%2Fpull%2F5211&amp;data=02%7C01%7CEric.Erha
> > > rdt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%7C72f988bf86f
> > > 141af91ab2d7cd011db47%7C1%7C0%7C637031638512163816&amp;sdata=zWaHS8X
> > > YIQA85xcFG%2FMrOcSfrI8xZtyuHRoaDH%2FIP2g%3D&amp;reserved=0
> > >
> > > I'm going to create a integration branch where we can merge each
> > > patch before merging to master
> > >
> > > On Fri, Aug 23, 2019 at 9:03 AM Wes McKinney <wesmck...@gmail.com> wrote:
> > > >
> > > > It isn't implemented in C++ yet but I will try to get a patch up
> > > > for that soon (today maybe). I think we should create a branch
> > > > where we can stack the patches that implement this for each language.
> > > >
> > > > On Fri, Aug 23, 2019 at 4:04 AM Paul Taylor
> > > > <ptaylor.apa...@gmail.com>
> > > wrote:
> > > > >
> > > > > I'll do the JS updates. Is it safe to validate against the Arrow
> > > > > C++ integration tests?
> > > > >
> > > > >
> > > > > On 8/22/19 7:28 PM, Micah Kornfield wrote:
> > > > > > I created
> > > > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2
> > > > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6313&amp;data=02
> > > > > > %7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d
> > > > > > 730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370316
> > > > > > 38512163816&amp;sdata=L57rZWFPdeuRtxFTkL%2F4g9RNI8lXFkRDXQadmj
> > > > > > NiLxI%3D&amp;reserved=0 as a
> > > tracking
> > > > > > issue with sub-issues on the development work.  So far no-one
> > > > > > has
> > > claimed
> > > > > > Java and Javascript tasks.
> > > > > >
> > > > > > Would it make sense to have a separate dev branch for this work?
> > > > > >
> > > > > > Thanks,
> > > > > > Micah
> > > > > >
> > > > > > On Thu, Aug 22, 2019 at 3:24 PM Wes McKinney
> > > > > > <wesmck...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> The vote carries with 4 binding +1 votes and 1 non-binding +1
> > > > > >>
> > > > > >> I'll merge the specification patch later today and we can
> > > > > >> begin working on implementations so we can get this done for
> > > > > >> 0.15.0
> > > > > >>
> > > > > >> On Tue, Aug 20, 2019 at 12:30 PM Bryan Cutler
> > > > > >> <cutl...@gmail.com>
> > > wrote:
> > > > > >>> +1 (non-binding)
> > > > > >>>
> > > > > >>> On Tue, Aug 20, 2019, 7:43 AM Antoine Pitrou
> > > > > >>> <solip...@pitrou.net>
> > > > > >> wrote:
> > > > > >>>> Sorry, had forgotten to send my vote on this.
> > > > > >>>>
> > > > > >>>> +1 from me.
> > > > > >>>>
> > > > > >>>> Regards
> > > > > >>>>
> > > > > >>>> Antoine.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Wed, 14 Aug 2019 17:42:33 -0500 Wes McKinney
> > > > > >>>> <wesmck...@gmail.com> wrote:
> > > > > >>>>> hi all,
> > > > > >>>>>
> > > > > >>>>> As we've been discussing [1], there is a need to introduce
> > > > > >>>>> 4
> > > bytes of
> > > > > >>>>> padding into the preamble of the "encapsulated IPC message"
> > > format to
> > > > > >>>>> ensure that the Flatbuffers metadata payload begins on an
> > > > > >>>>> 8-byte aligned memory offset. The alternative to this
> > > > > >>>>> would be for Arrow implementations where alignment is
> > > > > >>>>> important (e.g. C or C++) to
> > > copy
> > > > > >>>>> the metadata (which is not always small) into memory when
> > > > > >>>>> it is unaligned.
> > > > > >>>>>
> > > > > >>>>> Micah has proposed to address this by adding a 4-byte
> > > > > >>>>> "continuation" value at the beginning of the payload
> > > > > >>>>> having the value 0xFFFFFFFF. The reason to do it this way
> > > > > >>>>> is that old clients will see an invalid length (what is
> > > > > >>>>> currently the first 4 bytes of the message -- a 32-bit
> > > > > >>>>> little endian signed integer indicating the metadata
> > > > > >>>>> length) rather than potentially crashing on a valid
> > > > > >>>>> length. We also propose to expand the "end of stream"
> > > > > >>>>> marker used in the stream and file format from 4 to 8
> > > > > >>>>> bytes. This has the additional effect of aligning the file 
> > > > > >>>>> footer defined in File.fbs.
> > > > > >>>>>
> > > > > >>>>> This would be a backwards incompatible protocol change, so
> > > > > >>>>> older
> > > > > >> Arrow
> > > > > >>>>> libraries would not be able to read these new messages.
> > > Maintaining
> > > > > >>>>> forward compatibility (reading data produced by older
> > > > > >>>>> libraries)
> > > > > >> would
> > > > > >>>>> be possible as we can reason that a value other than the
> > > continuation
> > > > > >>>>> value was produced by an older library (and then validate
> > > > > >>>>> the Flatbuffer message of course). Arrow implementations
> > > > > >>>>> could offer
> > > a
> > > > > >>>>> backward compatibility mode for the sake of old readers if
> > > > > >>>>> they
> > > > > >> desire
> > > > > >>>>> (this may also assist with testing).
> > > > > >>>>>
> > > > > >>>>> Additionally with this vote, we want to formally approve
> > > > > >>>>> the
> > > change
> > > > > >> to
> > > > > >>>>> the Arrow "file" format to always write the (new 8-byte)
> > > > > >> end-of-stream
> > > > > >>>>> marker, which enables code that processes Arrow streams to
> > > > > >>>>> safely
> > > > > >> read
> > > > > >>>>> the file's internal messages as though they were a normal 
> > > > > >>>>> stream.
> > > > > >>>>>
> > > > > >>>>> The PR making these changes to the IPC documentation is
> > > > > >>>>> here
> > > > > >>>>>
> > > > > >>>>> https://nam06.safelinks.protection.outlook.com/?url=https%
> > > > > >>>>> 3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4951&amp;data
> > > > > >>>>> =02%7C01%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff
> > > > > >>>>> 5c9008d730e66b68%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C
> > > > > >>>>> 0%7C637031638512163816&amp;sdata=WF9uQ1d7GzHohv31%2BW3tl3I
> > > > > >>>>> vp9Uo9h6VYVoXu52umTE%3D&amp;reserved=0
> > > > > >>>>>
> > > > > >>>>> Please vote to accept these changes. This vote will be
> > > > > >>>>> open for
> > > at
> > > > > >>>>> least 72 hours
> > > > > >>>>>
> > > > > >>>>> [ ] +1 Adopt these Arrow protocol changes [ ] +0 [ ] -1 I
> > > > > >>>>> disagree because...
> > > > > >>>>>
> > > > > >>>>> Here is my vote: +1
> > > > > >>>>>
> > > > > >>>>> Thanks,
> > > > > >>>>> Wes
> > > > > >>>>>
> > > > > >>>>> [1]:
> > > > > >>
> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli
> > > sts.apache.org%2Fthread.html%2F8440be572c49b7b2ffb76b63e6d935ada9efd
> > > 9c1c2021369b6d27786%40%253Cdev.arrow.apache.org%253E&amp;data=02%7C0
> > > 1%7CEric.Erhardt%40microsoft.com%7C90f02600c4ce40ff5c9008d730e66b68%
> > > 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637031638512173773&amp;
> > > sdata=4y7ProY0ZDIqXAWYah6NS7TRZHGoYfZ6zMipdLV5ntk%3D&amp;reserved=0
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > >
>

Reply via email to