I started PR https://github.com/apache/arrow/pull/4951 to capture the proposed change.
I would like to verify that with this change all alignment issues are solved. It seems like there is mostly consensus on having another minor release to get this resolved. I think the open questions I have: 1. Do we want to provide backwards compatibility for writers? 2. Similar to the file format, should there be some leading "magic" indicator in the stream format to indicate a version, so we can do something a little less ugly if we encounter more issues? On Thu, Jul 25, 2019 at 11:05 AM Wes McKinney <wesmck...@gmail.com> wrote: > We probably need to have a vote here. Either Micah or I can propose > changes to > > > https://github.com/apache/arrow/blob/master/docs/source/format/IPC.rst#encapsulated-message-format > > to indicate the new binary protocol and the backwards/forward > compatibility strategy > > Any other thoughts about this before a vote is proposed (along with a > corresponding PR to change the format document)? > > AFAIK there are a few implementations of the protocol that will need > development work around this: > > - C++ > - C# > - Java > - JavaScript > - Go > > (has Rust implemented this yet?) > > The scope of changes in each case should be relatively isolated. In > the case of C++, if we want to have the ability to emit backwards > compatible messages (for old readers, e.g. <= 0.14.0, this is a good > idea anyway for unit testing) then a bit more refactoring will be > required to introduce the appropriate option. We could also expand the > integration tests around this issue as needed > > On Fri, Jul 19, 2019 at 8:53 AM David Li <li.david...@gmail.com> wrote: > > > > I agree with Bryan and Micah - a gradual transition as part of 1.0 (or > > 0.15.0) would be much less painful for us than staying on pre-1.0 > > until we can upgrade everything using Arrow at once. It is kind of a > > 'have your cake and eat it too' situation, and it would be a > > maintenance burden, but something like what Micah proposes would be > > ideal. > > > > Thanks, > > David > > > > On 7/19/19, Micah Kornfield <emkornfi...@gmail.com> wrote: > > > I'm trying to work out the exact steps in my mind for a migration. It > seems > > > like one approach is: > > > > > > 1. Add a code change which throws a clear exception it encounters -1 > for > > > size. In java the reasonable place seems to be at [1] (there might be > > > more?). The exception should state that the current stream reader > isn't > > > compatible with version 1.0.0 streams (we should have similar ones in > each > > > language). We can add a note about the environment variable in 2 if we > > > decide to do it. Release this change as 0.15.0 or 0.14.2 and ensure at > > > least Spark upgrades to this version. > > > > > > 2. Change the reader implementation to support reading both 1.0.0 > streams > > > and be backwards compatible with pre-1.0.0 streams. Change the writer > > > implementation to default to writing 1.0.0 streams but have an > environment > > > variable that make it write backwards compatible streams (writer > > > compatibility seems like it should be optional). Release this as 1.0.0 > > > > > > 3. If provided, remove the environment variable switch in a later > release. > > > > > > Thanks, > > > Micah > > > > > > [1] > > > > https://github.com/apache/arrow/blob/9fe728c86caaf9ceb1827159eb172ff81fb98550/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java#L67 > > > > > > On Thu, Jul 18, 2019 at 8:58 PM Wes McKinney <wesmck...@gmail.com> > wrote: > > > > > >> To be clear, we could make a patch 0.14.x release that includes the > > >> necessary compatibility changes. I presume Spark will be able to > upgrade > > >> to > > >> a new patch release (I'd be surprised if not, otherwise how can you > get > > >> security fixes)? > > >> > > >> On Thu, Jul 18, 2019, 10:52 PM Bryan Cutler <cutl...@gmail.com> > wrote: > > >> > > >> > Hey Wes, > > >> > I understand we don't want to burden 1.0 by maintaining > compatibility > > >> > and > > >> > that is fine with me. I'm just try to figure out how to best handle > > >> > this > > >> > situation so Spark users won't get a cryptic error message. It > sounds > > >> like > > >> > it will need to be handled on the Spark side to not allow mixing 1.0 > > >> > and > > >> > pre-1.0 versions. I'm not too sure how much a 0.15.0 release with > > >> > compatibility would help, it might depend on when things get > released > > >> > but > > >> > we can discuss that in another thread. > > >> > > > >> > On Thu, Jul 18, 2019 at 12:03 PM Wes McKinney <wesmck...@gmail.com> > > >> wrote: > > >> > > > >> > > hi Bryan -- well, the reason for the current 0.x version is > precisely > > >> > > to avoid a situation where we are making decisions on the basis of > > >> > > maintaining forward / backward compatibility. > > >> > > > > >> > > One possible way forward on this is to make a 0.15.0 (0.14.2, so > > >> > > there > > >> > > is less trouble for Spark to upgrade) release that supports > reading > > >> > > _both_ old and new variants of the protocol. > > >> > > > > >> > > On Thu, Jul 18, 2019 at 1:20 PM Bryan Cutler <cutl...@gmail.com> > > >> wrote: > > >> > > > > > >> > > > Are we going to say that Arrow 1.0 is not compatible with any > > >> > > > version > > >> > > > before? My concern is that Spark 2.4.x might get stuck on Arrow > > >> > > > Java > > >> > > > 0.14.1 and a lot of users will install PyArrow 1.0.0, which will > > >> > > > not > > >> > > work. > > >> > > > In Spark 3.0.0, though it will be no problem to update both Java > > >> > > > and > > >> > > Python > > >> > > > to 1.0. Having a compatibility mode so that new readers/writers > can > > >> > work > > >> > > > with old readers using a 4-byte prefix would solve the problem, > but > > >> if > > >> > we > > >> > > > don't want to do this will pyarrow be able to raise an error > that > > >> > clearly > > >> > > > the new version does not support the old protocol? For example, > > >> would > > >> > a > > >> > > > pyarrow reader see the 0xFFFFFFFF and raise something like > "PyArrow > > >> > > > detected an old protocol and cannot continue, please use a > version > > >> > > > < > > >> > > 1.0.0"? > > >> > > > > > >> > > > On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney < > wesmck...@gmail.com> > > >> > > wrote: > > >> > > > > > >> > > > > Hi Francois -- copying the metadata into memory isn't the end > of > > >> the > > >> > > world > > >> > > > > but it's a pretty ugly wart. This affects every IPC protocol > > >> message > > >> > > > > everywhere. > > >> > > > > > > >> > > > > We have an opportunity to address the wart now but such a fix > > >> > > post-1.0.0 > > >> > > > > will be much more difficult. > > >> > > > > > > >> > > > > On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques < > > >> > > > > fsaintjacq...@gmail.com> wrote: > > >> > > > > > > >> > > > > > If the data buffers are still aligned, then I don't think we > > >> should > > >> > > > > > add a breaking change just for avoiding the copy on the > > >> > > > > > metadata? > > >> > I'd > > >> > > > > > expect said metadata to be small enough that zero-copy > doesn't > > >> > really > > >> > > > > > affect performance. > > >> > > > > > > > >> > > > > > François > > >> > > > > > > > >> > > > > > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield < > > >> > > emkornfi...@gmail.com> > > >> > > > > > wrote: > > >> > > > > > > > > >> > > > > > > While working on trying to fix undefined behavior for > > >> > > > > > > unaligned > > >> > > memory > > >> > > > > > > accesses [1], I ran into an issue with the IPC > specification > > >> [2] > > >> > > which > > >> > > > > > > prevents us from ever achieving zero-copy memory mapping > and > > >> > having > > >> > > > > > aligned > > >> > > > > > > accesses (i.e. clean UBSan runs). > > >> > > > > > > > > >> > > > > > > Flatbuffer metadata needs 8-byte alignment to guarantee > > >> > > > > > > aligned > > >> > > > > accesses. > > >> > > > > > > > > >> > > > > > > In the IPC format we align each message to 8-byte > boundaries. > > >> We > > >> > > then > > >> > > > > > > write a int32_t integer to to denote the size of flat > buffer > > >> > > metadata, > > >> > > > > > > followed immediately by the flatbuffer metadata. This > means > > >> the > > >> > > > > > > flatbuffer metadata will never be 8 byte aligned. > > >> > > > > > > > > >> > > > > > > Do people care? A simple fix would be to use int64_t > > >> > > > > > > instead > > >> of > > >> > > > > int32_t > > >> > > > > > > for length. However, any fix essentially breaks all > previous > > >> > > client > > >> > > > > > > library versions or incurs a memory copy. > > >> > > > > > > > > >> > > > > > > [1] https://github.com/apache/arrow/pull/4757 > > >> > > > > > > [2] https://arrow.apache.org/docs/ipc.html > > >> > > > > > > > >> > > > > > > >> > > > > >> > > > >> > > > >