Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Wes McKinney Thu, 18 Jul 2019 12:04:09 -0700

hi Bryan -- well, the reason for the current 0.x version is precisely
to avoid a situation where we are making decisions on the basis of
maintaining forward / backward compatibility.


One possible way forward on this is to make a 0.15.0 (0.14.2, so there
is less trouble for Spark to upgrade) release that supports reading
_both_ old and new variants of the protocol.

On Thu, Jul 18, 2019 at 1:20 PM Bryan Cutler <cutl...@gmail.com> wrote:
>
> Are we going to say that Arrow 1.0 is not compatible with any version
> before?  My concern is that Spark 2.4.x might get stuck on Arrow Java
> 0.14.1 and a lot of users will install PyArrow 1.0.0, which will not work.
> In Spark 3.0.0, though it will be no problem to update both Java and Python
> to 1.0. Having a compatibility mode so that new readers/writers can work
> with old readers using a 4-byte prefix would solve the problem, but if we
> don't want to do this will pyarrow be able to raise an error that clearly
> the new version does not support the old protocol?  For example, would a
> pyarrow reader see the 0xFFFFFFFF and raise something like "PyArrow
> detected an old protocol and cannot continue, please use a version < 1.0.0"?
>
> On Thu, Jul 11, 2019 at 12:39 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > Hi Francois -- copying the metadata into memory isn't the end of the world
> > but it's a pretty ugly wart. This affects every IPC protocol message
> > everywhere.
> >
> > We have an opportunity to address the wart now but such a fix post-1.0.0
> > will be much more difficult.
> >
> > On Thu, Jul 11, 2019, 2:05 PM Francois Saint-Jacques <
> > fsaintjacq...@gmail.com> wrote:
> >
> > > If the data buffers are still aligned, then I don't think we should
> > > add a breaking change just for avoiding the copy on the metadata? I'd
> > > expect said metadata to be small enough that zero-copy doesn't really
> > > affect performance.
> > >
> > > François
> > >
> > > On Sun, Jun 30, 2019 at 4:01 AM Micah Kornfield <emkornfi...@gmail.com>
> > > wrote:
> > > >
> > > > While working on trying to fix undefined behavior for unaligned memory
> > > > accesses [1], I ran into an issue with the IPC specification [2] which
> > > > prevents us from ever achieving zero-copy memory mapping and having
> > > aligned
> > > > accesses (i.e. clean UBSan runs).
> > > >
> > > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned
> > accesses.
> > > >
> > > > In the IPC format we align each message to 8-byte boundaries.  We then
> > > > write a int32_t integer to to denote the size of flat buffer metadata,
> > > > followed immediately  by the flatbuffer metadata.  This means the
> > > > flatbuffer metadata will never be 8 byte aligned.
> > > >
> > > > Do people care?  A simple fix  would be to use int64_t instead of
> > int32_t
> > > > for length.  However, any fix essentially breaks all previous client
> > > > library versions or incurs a memory copy.
> > > >
> > > > [1] https://github.com/apache/arrow/pull/4757
> > > > [2] https://arrow.apache.org/docs/ipc.html
> > >
> >

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Reply via email to