Thanks for the references. If we decided to make a change around this, we could call the first 4 bytes a stream continuation marker to make it slightly less ugly
* 0xFFFFFFFF: continue * 0x00000000: stop On Mon, Jul 1, 2019 at 4:35 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > > Hi Wes, > I'm not an expert on this either, my inclination mostly comes from some > research I've done. I think it is important to distinguish two cases: > 1. unaligned access at the processor instruction level > 2. undefined behavior > > From my reading unaligned access is fine on most modern architectures and it > seems the performance penalty has mostly been eliminated. > > Undefined behavior is a compiler/language concept. The problem is the > compiler can choose to do anything in UB scenarios, not just the "obvious" > translation. Specifically, the compiler is under no obligation to generate > the unaligned access instructions, and if it doesn't SEGVs ensue. Two > examples, both of which relate to SIMD optimizations are linked below. > > I tend to be on the conservative side with this type of thing but if we have > experts on the the ML that can offer a more informed opinion, I would love to > hear it. > > [1] http://pzemtsov.github.io/2016/11/06/bug-story-alignment-on-x86.html > [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709 > > On Mon, Jul 1, 2019 at 1:41 PM Wes McKinney <wesmck...@gmail.com> wrote: >> >> The <0xffffffff><int32_t size> solution is downright ugly but I think >> it's one of the only ways that achieves >> >> * backward compatibility (new clients can read old data) >> * opt-in forward compatibility (if we want to go to the labor of doing >> so, sort of dangerous) >> * old clients receiving new data do not blow up (they will see a >> metadata length of -1) >> >> NB 0xFFFFFFFF <length> would look like: >> >> In [13]: np.array([(2 << 32) - 1, 128], dtype=np.uint32) >> Out[13]: array([4294967295, 128], dtype=uint32) >> >> In [14]: np.array([(2 << 32) - 1, 128], >> dtype=np.uint32).view(np.int32) >> Out[14]: array([ -1, 128], dtype=int32) >> >> In [15]: np.array([(2 << 32) - 1, 128], dtype=np.uint32).view(np.uint8) >> Out[15]: array([255, 255, 255, 255, 128, 0, 0, 0], dtype=uint8) >> >> Flatbuffers are 32-bit limited so we don't need all 64 bits. >> >> Do you know in what circumstances unaligned reads from Flatbuffers >> might cause an issue? I do not know enough about UB but my >> understanding is that it causes issues on some specialized platforms >> where for most modern x86-64 processors and compilers it is not really >> an issue (though perhaps a performance issue) >> >> On Sun, Jun 30, 2019 at 6:36 PM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> > >> > At least on the read-side we can make this detectable by using something >> > like <0xffffffff><int32_t size> instead of int64_t. On the write side we >> > would need some sort of default mode that we could flip on/off if we >> > wanted to maintain compatibility. >> > >> > I should say I think we should fix it. Undefined behavior is unpaid debt >> > that might never be collected or might cause things to fail in difficult >> > to diagnose ways. And pre-1.0.0 is definitely the time. >> > >> > -Micah >> > >> > On Sun, Jun 30, 2019 at 3:17 PM Wes McKinney <wesmck...@gmail.com> wrote: >> >> >> >> On Sun, Jun 30, 2019 at 5:14 PM Wes McKinney <wesmck...@gmail.com> wrote: >> >> > >> >> > hi Micah, >> >> > >> >> > This is definitely unfortunate, I wish we had realized the potential >> >> > implications of having the Flatbuffer message start on a 4-byte >> >> > (rather than 8-byte) boundary. The cost of making such a change now >> >> > would be pretty high since all readers and writers in all languages >> >> > would have to be changed. That being said, the 0.14.0 -> 1.0.0 version >> >> > bump is the last opportunity we have to make a change like this, so we >> >> > might as well discuss it now. Note that particular implementations >> >> > could implement compatibility functions to handle the 4 to 8 byte >> >> > change so that old clients can still be understood. We'd probably want >> >> > to do this in C++, for example, since users would pretty quickly >> >> > acquire a new pyarrow version in Spark applications while they are >> >> > stuck on an old version of the Java libraries. >> >> >> >> NB such a backwards compatibility fix would not be forward-compatible, >> >> so the PySpark users would need to use a pinned version of pyarrow >> >> until Spark upgraded to Arrow 1.0.0. Maybe that's OK >> >> >> >> > >> >> > - Wes >> >> > >> >> > On Sun, Jun 30, 2019 at 3:01 AM Micah Kornfield <emkornfi...@gmail.com> >> >> > wrote: >> >> > > >> >> > > While working on trying to fix undefined behavior for unaligned memory >> >> > > accesses [1], I ran into an issue with the IPC specification [2] which >> >> > > prevents us from ever achieving zero-copy memory mapping and having >> >> > > aligned >> >> > > accesses (i.e. clean UBSan runs). >> >> > > >> >> > > Flatbuffer metadata needs 8-byte alignment to guarantee aligned >> >> > > accesses. >> >> > > >> >> > > In the IPC format we align each message to 8-byte boundaries. We then >> >> > > write a int32_t integer to to denote the size of flat buffer metadata, >> >> > > followed immediately by the flatbuffer metadata. This means the >> >> > > flatbuffer metadata will never be 8 byte aligned. >> >> > > >> >> > > Do people care? A simple fix would be to use int64_t instead of >> >> > > int32_t >> >> > > for length. However, any fix essentially breaks all previous client >> >> > > library versions or incurs a memory copy. >> >> > > >> >> > > [1] https://github.com/apache/arrow/pull/4757 >> >> > > [2] https://arrow.apache.org/docs/ipc.html