Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Wes McKinney Mon, 01 Jul 2019 15:21:23 -0700

Thanks for the references.

If we decided to make a change around this, we could call the first 4
bytes a stream continuation marker to make it slightly less ugly


* 0xFFFFFFFF: continue
* 0x00000000: stop

On Mon, Jul 1, 2019 at 4:35 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
>
> Hi Wes,
> I'm not an expert on this either, my inclination mostly comes from some 
> research I've done.  I think it is important to distinguish two cases:
> 1.  unaligned access at the processor instruction level
> 2.  undefined behavior
>
> From my reading unaligned access is fine on most modern architectures and it 
> seems the performance penalty has mostly been eliminated.
>
> Undefined behavior is a compiler/language concept.  The problem is the 
> compiler can choose to do anything in UB scenarios, not just the "obvious" 
> translation.  Specifically, the compiler is under no obligation to generate 
> the unaligned access instructions, and if it doesn't SEGVs ensue.  Two 
> examples, both of which relate to SIMD optimizations are linked below.
>
> I tend to be on the conservative side with this type of thing but if we have 
> experts on the the ML that can offer a more informed opinion, I would love to 
> hear it.
>
> [1] http://pzemtsov.github.io/2016/11/06/bug-story-alignment-on-x86.html
> [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709
>
> On Mon, Jul 1, 2019 at 1:41 PM Wes McKinney <wesmck...@gmail.com> wrote:
>>
>> The <0xffffffff><int32_t size> solution is downright ugly but I think
>> it's one of the only ways that achieves
>>
>> * backward compatibility (new clients can read old data)
>> * opt-in forward compatibility (if we want to go to the labor of doing
>> so, sort of dangerous)
>> * old clients receiving new data do not blow up (they will see a
>> metadata length of -1)
>>
>> NB 0xFFFFFFFF <length> would look like:
>>
>> In [13]: np.array([(2 << 32) - 1, 128], dtype=np.uint32)
>> Out[13]: array([4294967295,        128], dtype=uint32)
>>
>> In [14]: np.array([(2 << 32) - 1, 128],
>> dtype=np.uint32).view(np.int32)
>> Out[14]: array([ -1, 128], dtype=int32)
>>
>> In [15]: np.array([(2 << 32) - 1, 128], dtype=np.uint32).view(np.uint8)
>> Out[15]: array([255, 255, 255, 255, 128,   0,   0,   0], dtype=uint8)
>>
>> Flatbuffers are 32-bit limited so we don't need all 64 bits.
>>
>> Do you know in what circumstances unaligned reads from Flatbuffers
>> might cause an issue? I do not know enough about UB but my
>> understanding is that it causes issues on some specialized platforms
>> where for most modern x86-64 processors and compilers it is not really
>> an issue (though perhaps a performance issue)
>>
>> On Sun, Jun 30, 2019 at 6:36 PM Micah Kornfield <emkornfi...@gmail.com> 
>> wrote:
>> >
>> > At least on the read-side we can make this detectable by using something 
>> > like <0xffffffff><int32_t size> instead of int64_t.  On the write side we 
>> > would need some sort of default mode that we could flip on/off if we 
>> > wanted to maintain compatibility.
>> >
>> > I should say I think we should fix it.  Undefined behavior is unpaid debt 
>> > that might never be collected or might cause things to fail in difficult 
>> > to diagnose ways. And pre-1.0.0 is definitely the time.
>> >
>> > -Micah
>> >
>> > On Sun, Jun 30, 2019 at 3:17 PM Wes McKinney <wesmck...@gmail.com> wrote:
>> >>
>> >> On Sun, Jun 30, 2019 at 5:14 PM Wes McKinney <wesmck...@gmail.com> wrote:
>> >> >
>> >> > hi Micah,
>> >> >
>> >> > This is definitely unfortunate, I wish we had realized the potential
>> >> > implications of having the Flatbuffer message start on a 4-byte
>> >> > (rather than 8-byte) boundary. The cost of making such a change now
>> >> > would be pretty high since all readers and writers in all languages
>> >> > would have to be changed. That being said, the 0.14.0 -> 1.0.0 version
>> >> > bump is the last opportunity we have to make a change like this, so we
>> >> > might as well discuss it now. Note that particular implementations
>> >> > could implement compatibility functions to handle the 4 to 8 byte
>> >> > change so that old clients can still be understood. We'd probably want
>> >> > to do this in C++, for example, since users would pretty quickly
>> >> > acquire a new pyarrow version in Spark applications while they are
>> >> > stuck on an old version of the Java libraries.
>> >>
>> >> NB such a backwards compatibility fix would not be forward-compatible,
>> >> so the PySpark users would need to use a pinned version of pyarrow
>> >> until Spark upgraded to Arrow 1.0.0. Maybe that's OK
>> >>
>> >> >
>> >> > - Wes
>> >> >
>> >> > On Sun, Jun 30, 2019 at 3:01 AM Micah Kornfield <emkornfi...@gmail.com> 
>> >> > wrote:
>> >> > >
>> >> > > While working on trying to fix undefined behavior for unaligned memory
>> >> > > accesses [1], I ran into an issue with the IPC specification [2] which
>> >> > > prevents us from ever achieving zero-copy memory mapping and having 
>> >> > > aligned
>> >> > > accesses (i.e. clean UBSan runs).
>> >> > >
>> >> > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned 
>> >> > > accesses.
>> >> > >
>> >> > > In the IPC format we align each message to 8-byte boundaries.  We then
>> >> > > write a int32_t integer to to denote the size of flat buffer metadata,
>> >> > > followed immediately  by the flatbuffer metadata.  This means the
>> >> > > flatbuffer metadata will never be 8 byte aligned.
>> >> > >
>> >> > > Do people care?  A simple fix  would be to use int64_t instead of 
>> >> > > int32_t
>> >> > > for length.  However, any fix essentially breaks all previous client
>> >> > > library versions or incurs a memory copy.
>> >> > >
>> >> > > [1] https://github.com/apache/arrow/pull/4757
>> >> > > [2] https://arrow.apache.org/docs/ipc.html

Re: [Discuss] IPC Specification, flatbuffers and unaligned memory accesses

Reply via email to