Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Micah Kornfield Thu, 25 Jul 2019 22:28:05 -0700

I started PR https://github.com/apache/arrow/pull/4951 to capture the
proposed change.


I would like to verify that with this change all alignment issues are
solved.

It seems like there is mostly consensus on having another minor release to
get this resolved.  I think the open questions I have:
1.  Do we want to provide backwards compatibility for writers?
2.  Similar to the file format, should there be some leading "magic"
indicator in the stream format to indicate a version, so we can do
something a little less ugly if we encounter more issues?

On Thu, Jul 25, 2019 at 11:05 AM Wes McKinney <wesmck...@gmail.com> wrote:

> We probably need to have a vote here. Either Micah or I can propose
> changes to
>
>
> https://github.com/apache/arrow/blob/master/docs/source/format/IPC.rst#encapsulated-message-format
>
> to indicate the new binary protocol and the backwards/forward
> compatibility strategy
>
> Any other thoughts about this before a vote is proposed (along with a
> corresponding PR to change the format document)?
>
> AFAIK there are a few implementations of the protocol that will need
> development work around this:
>
> - C++
> - C#
> - Java
> - JavaScript
> - Go
>
> (has Rust implemented this yet?)
>
> The scope of changes in each case should be relatively isolated. In
> the case of C++, if we want to have the ability to emit backwards
> compatible messages (for old readers, e.g. <= 0.14.0, this is a good
> idea anyway for unit testing) then a bit more refactoring will be
> required to introduce the appropriate option. We could also expand the
> integration tests around this issue as needed
>
> On Fri, Jul 19, 2019 at 8:53 AM David Li <li.david...@gmail.com> wrote:
> >
> > I agree with Bryan and Micah - a gradual transition as part of 1.0 (or
> > 0.15.0) would be much less painful for us than staying on pre-1.0
> > until we can upgrade everything using Arrow at once. It is kind of a
> > 'have your cake and eat it too' situation, and it would be a
> > maintenance burden, but something like what Micah proposes would be
> > ideal.
> >
> > Thanks,
> > David
> >
> > On 7/19/19, Micah Kornfield <emkornfi...@gmail.com> wrote:
> > > I'm trying to work out the exact steps in my mind for a migration. It
> seems
> > > like one approach is:
> > >
> > > 1.  Add a code change which throws a clear exception it encounters -1
> for
> > > size.  In java the reasonable place seems to be at [1] (there might be
> > > more?).   The exception should state that the current stream reader
> isn't
> > > compatible with version 1.0.0 streams (we should have similar ones in
> each
> > > language).  We can add a note about the environment variable in 2 if we
> > > decide to do it.  Release this change as 0.15.0 or 0.14.2 and ensure at
> > > least Spark upgrades to this version.
> > >
> > > 2.  Change the reader implementation to support reading both 1.0.0
> streams
> > > and be backwards compatible with pre-1.0.0 streams.  Change the writer
> > > implementation to default to writing 1.0.0 streams but have an
> environment
> > > variable that make it write backwards compatible streams (writer
> > > compatibility seems like it should be optional).  Release this as 1.0.0
> > >
> > > 3. If provided, remove the environment variable switch in a later
> release.
> > >
> > > Thanks,
> > > Micah
> > >
> > > [1]
> > >
> https://github.com/apache/arrow/blob/9fe728c86caaf9ceb1827159eb172ff81fb98550/java/vector/src/main/java/org/apache/arrow/vector/ipc/message/MessageChannelReader.java#L67
> > >
> > > On Thu, Jul 18, 2019 at 8:58 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > >
> > >> To be clear, we could make a patch 0.14.x release that includes the
> > >> necessary compatibility changes. I presume Spark will be able to
> upgrade
> > >> to
> > >> a new patch release (I'd be surprised if not, otherwise how can you
> get
> > >> security fixes)?
> > >>
> > >> On Thu, Jul 18, 2019, 10:52 PM Bryan Cutler <cutl...@gmail.com>
> wrote:
> > >>
> > >> > Hey Wes,
> > >> > I understand we don't want to burden 1.0 by maintaining
> compatibility
> > >> > and
> > >> > that is fine with me. I'm just try to figure out how to best handle
> > >> > this
> > >> > situation so Spark users won't get a cryptic error message. It
> sounds
> > >> like
> > >> > it will need to be handled on the Spark side to not allow mixing 1.0
> > >> > and
> > >> > pre-1.0 versions. I'm not too sure how much a 0.15.0 release with
> > >> > compatibility would help, it might depend on when things get
> released
> > >> > but
> > >> > we can discuss that in another thread.
> > >> >
> > >> > On Thu, Jul 18, 2019 at 12:03 PM Wes McKinney <wesmck...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > hi Bryan -- well, the reason for the current 0.x version is
> precisely
> > >> > > to avoid a situation where we are making decisions on the basis of
> > >> > > maintaining forward / backward compatibility.
> > >> > >
> > >> > > One possible way forward on this is to make a 0.15.0 (0.14.2, so
> > >> > > there
> > >> > > is less trouble for Spark to upgrade) release that supports
> reading
> > >> > > _both_ old and new variants of the protocol.
> > >> > >
> > >> > > On Thu, Jul 18, 2019 at 1:20 PM Bryan Cutler <cutl...@gmail.com>
> > >> wrote:
> > >> > > >
> > >> > > > Are we going to say that Arrow 1.0 is not compatible with any
> > >> > > > version
> > >> > > > before?  My concern is that Spark 2.4.x might get stuck on Arrow
> > >> > > > Java
> > >> > > > 0.14.1 and a lot of users will install PyArrow 1.0.0, which will
> > >> > > > not
> > >> > > work.
> > >> > > > In Spark 3.0.0, though it will be no problem to update both Java
> > >> > > > and
> > >> > > Python
> > >> > > > to 1.0. Having a compatibility mode so that new readers/writers
> can
> > >> > work
> > >> > > > with old readers using a 4-byte prefix would solve the problem,
> but
> > >> if
> > >> > we
> > >> > > > don't want to do this will pyarrow be able to raise an error
> that
> > >> > clearly
> > >> > > > the new version does not support the old protocol?  For example,
> > >> would
> > >> > a
> > >> > > > pyarrow reader see the 0xFFFFFFFF and raise something like
> "PyArrow
> > >> > > > detected an old protocol and cannot continue, please use a
> version
> > >> > > > <
> > >> > > 1.0.0"?
> > >> > > >
> > >> > > > On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney <
> wesmck...@gmail.com>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Hi Francois -- copying the metadata into memory isn't the end
> of
> > >> the
> > >> > > world
> > >> > > > > but it's a pretty ugly wart. This affects every IPC protocol
> > >> message
> > >> > > > > everywhere.
> > >> > > > >
> > >> > > > > We have an opportunity to address the wart now but such a fix
> > >> > > post-1.0.0
> > >> > > > > will be much more difficult.
> > >> > > > >
> > >> > > > > On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques <
> > >> > > > > fsaintjacq...@gmail.com> wrote:
> > >> > > > >
> > >> > > > > > If the data buffers are still aligned, then I don't think we
> > >> should
> > >> > > > > > add a breaking change just for avoiding the copy on the
> > >> > > > > > metadata?
> > >> > I'd
> > >> > > > > > expect said metadata to be small enough that zero-copy
> doesn't
> > >> > really
> > >> > > > > > affect performance.
> > >> > > > > >
> > >> > > > > > François
> > >> > > > > >
> > >> > > > > > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield <
> > >> > > emkornfi...@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > While working on trying to fix undefined behavior for
> > >> > > > > > > unaligned
> > >> > > memory
> > >> > > > > > > accesses [1], I ran into an issue with the IPC
> specification
> > >> [2]
> > >> > > which
> > >> > > > > > > prevents us from ever achieving zero-copy memory mapping
> and
> > >> > having
> > >> > > > > > aligned
> > >> > > > > > > accesses (i.e. clean UBSan runs).
> > >> > > > > > >
> > >> > > > > > > Flatbuffer metadata needs 8-byte alignment to guarantee
> > >> > > > > > > aligned
> > >> > > > > accesses.
> > >> > > > > > >
> > >> > > > > > > In the IPC format we align each message to 8-byte
> boundaries.
> > >> We
> > >> > > then
> > >> > > > > > > write a int32_t integer to to denote the size of flat
> buffer
> > >> > > metadata,
> > >> > > > > > > followed immediately  by the flatbuffer metadata.  This
> means
> > >> the
> > >> > > > > > > flatbuffer metadata will never be 8 byte aligned.
> > >> > > > > > >
> > >> > > > > > > Do people care?  A simple fix  would be to use int64_t
> > >> > > > > > > instead
> > >> of
> > >> > > > > int32_t
> > >> > > > > > > for length.  However, any fix essentially breaks all
> previous
> > >> > > client
> > >> > > > > > > library versions or incurs a memory copy.
> > >> > > > > > >
> > >> > > > > > > [1] https://github.com/apache/arrow/pull/4757
> > >> > > > > > > [2] https://arrow.apache.org/docs/ipc.html
> > >> > > > > >
> > >> > > > >
> > >> > >
> > >> >
> > >>
> > >
>

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Reply via email to