hi all, Micah put up a patch for ARROW-104 based on these discussions
https://github.com/apache/arrow/pull/67 Once we review and agree on these changes (the main thing is memory alignment and padding for reliable SIMD-friendliness) we can push forward on conforming the implementations. Big picture, as we've been discussing, it's going to be important to get integration tests working soon to round-trip data from Java to C++ and vice versa. thanks Wes On Fri, Apr 22, 2016 at 4:04 PM, Micah Kornfield <emkornfi...@gmail.com> wrote: > https://issues.apache.org/jira/browse/ARROW-104 covers the change to > the spec. I should hopefully have a pull request out sometime this > weekend. Once it was merged I was going to open up JIRAs for C++ and > Java code. > > On Fri, Apr 22, 2016 at 12:18 PM, Wes McKinney <w...@cloudera.com> wrote: >> Shall we create a JIRA to codify the alignment settings someplace in >> the metadata? We may need an umbrella JIRA to include downstream >> issues tracking code changes to implement and test the default >> alignment. >> >> On Sat, Apr 9, 2016 at 11:34 PM, Micah Kornfield <emkornfi...@gmail.com> >> wrote: >>> Hi Kai, >>> I think we are in agreement. For cross machine transfers, alignment >>> doesn't make a difference but width does. We could optimize >>> transfers, by only sending non-padded buffers and then have clients >>> re-pad the data (but before we optimize we should likely have a >>> working implementation :) >>> >>> The use-case I was thinking of for transferring alignment as part of >>> the metadata was for the shared-memory IPC. I just double checked the >>> current IPC metadata >>> (https://github.com/apache/arrow/blob/master/format/Message.fbs), and >>> all the necessary information is already in there to check for >>> alignment and width. So passing along the alignment/width >>> requirements shouldn't be necessary. We should be able to check >>> offset and lengths when reading the metadata. >>> >>> All of this might be getting into a little bit of premature >>> optimization though. I will wait a little bit longer for comments >>> from others and then update the spec. >>> >>> Thanks, >>> Micah >>> >>> On Sat, Apr 9, 2016 at 4:55 PM, Zheng, Kai <kai.zh...@intel.com> wrote: >>>> Hi Micah, >>>> >>>> Thanks for your thorough thoughts. The general consideration makes sense >>>> to me building Arrow with SIMD support in mind meanwhile not complicating >>>> codes too much, to use a fixed value 64 byte by default. We can always >>>> improve and optimize this accordingly when have concrete solid algorithms >>>> and workloads to benchmark with and collect real performance data, as you >>>> said. >>>> >>>> One thing I'm not sure about is, whether alignment requirement should be >>>> included in IPC metadata, because in my understanding, no buffer address >>>> is needed to be passed across machines, so it's up to destination machine >>>> to decide how to reallocate the data buffer for receiving the transferred >>>> data with whatever alignment address. An alignment address that's good to >>>> the source machine but may be not good to the destination. >>>> >>>> So in summary, it's good to mention this alignment consideration in the >>>> spec, also saying the fixed 64 byte alignment address is used by default; >>>> and hard-code the fixed value in source codes (for example, when >>>> allocating buffers for primitive arrays, chunk array buffers, and null >>>> bitmap buffers). >>>> >>>> Please help clarify if I'm not getting you right. Thanks. >>>> >>>> Regards, >>>> Kai >>>> >>>> -----Original Message----- >>>> From: Micah Kornfield [mailto:emkornfi...@gmail.com] >>>> Sent: Saturday, April 09, 2016 12:56 PM >>>> To: dev@arrow.apache.org >>>> Subject: Re: Some questions/proposals for the spec (Layout.md) >>>> >>>> Hi Kai, >>>> Are you proposing making alignment and width part of the RPC metadata? >>>> I think this is a good longer term idea, but for simplicity's sake, I >>>> think starting with one fixed value is a good idea. >>>> >>>> I agree that in the general case guaranteeing alignment is difficult when >>>> we have variable width data (e.g. strings) or sliced data >>>> (https://issues.apache.org/jira/browse/ARROW-33). However, I think a >>>> fairly common use-case for Arrow will be dealing with fixed width >>>> non-nested types (e.g. float, doubles, int32_t) where alignment can be >>>> guaranteed. In these cases being able to make use of the optimal CPU >>>> instruction set is important. >>>> >>>> In this regard, one concern with 8 bytes as the default width is that it >>>> will cause suboptimal use of current CPUs. For instance, the Intel >>>> Optimization Guide >>>> (http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf) >>>> states "An access to data unaligned on 64-byte boundary leads to two >>>> memory accesses and requires several μops to be executed (instead of >>>> one)." and "A 64-byte or greater data structure or array should be aligned >>>> so that its base address is a multiple of 64." >>>> >>>> It would be interesting to know the exact performance difference for >>>> compiler generated code knowing about different degrees of alignment/width >>>> as well as the performance difference using assembly/intrinsics. In the >>>> absence of the performance data, I think defaulting to 64 byte alignment >>>> (when the programming language allows for it) based on recommendation from >>>> the guide makes sense. In addition given the existence of 512-bit SIMD, >>>> using 64 byte padding for width also makes sense. >>>> >>>> Do you have concerns if we make 64 bytes the default instead of 8? >>>> >>>> Thanks, >>>> Micah >>>> >>>> On Fri, Apr 8, 2016 at 1:44 PM, Zheng, Kai <kai.zh...@intel.com> wrote: >>>>> I'm from Intel but not any hardware folks, just would provide my >>>>> thoughts. Yes the width and alignment requirement can be very different >>>>> according to what version of SIMD is used. And also, sometimes it's hard >>>>> to keep the alignment to access specific fields or parts in the even >>>>> aligned memory region. It's complex, I thought it's good to mention this >>>>> aspect of consideration in the spec but come to the data structures or >>>>> format, it can leave to platform specific optimizations regarding to >>>>> concrete computing operators and algorithms to use alignment awareness >>>>> buffer allocators considering this potential performance impact. A >>>>> default value of 8 as mentioned may be used but other values can also be >>>>> passed. >>>>> >>>>> Regards, >>>>> Kai >>>>> >>>>> -----Original Message----- >>>>> From: Wes McKinney [mailto:w...@cloudera.com] >>>>> Sent: Friday, April 08, 2016 11:40 PM >>>>> To: dev@arrow.apache.org >>>>> Subject: Re: Some questions/proposals for the spec (Layout.md) >>>>> >>>>> On the SIMD question, it seems AVX is going to 512 bits, so one could >>>>> even argue for 64-byte alignment as a matter of future-proofing. AVX2 / >>>>> 256-bit seems fairly widely available nowadays, but it would be great if >>>>> Todd or any of the hardware folks (e.g. from Intel) on the list could >>>>> weigh in with guidance. >>>>> >>>>> https://en.wikipedia.org/wiki/Advanced_Vector_Extensions >>>>> >>>>> On Fri, Apr 8, 2016 at 8:33 AM, Wes McKinney <w...@cloudera.com> wrote: >>>>>> On Fri, Apr 8, 2016 at 8:07 AM, Jacques Nadeau <jacq...@apache.org> >>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> > I believe this choice was primarily about simplifying the code >>>>>>>> > (similar >>>>>>>> to why we have a n+1 >>>>>>>> > offsets instead of just n in the list/varchar representations >>>>>>>> > (even >>>>>>>> though n=0 is always 0)). In both >>>>>>>> > situations, you don't have to worry about writing special code >>>>>>>> > (and a >>>>>>>> condition) for the boundary >>>>>>>> > condition inside tight loops (e.g. the last few bytes need to be >>>>>>>> > handled >>>>>>>> differently since they >>>>>>>> > aren't word width). >>>>>>>> >>>>>>>> Sounds reasonable. It might be worth illustrating this with a >>>>>>>> concrete example. One scenario that this scheme seems useful for >>>>>>>> is a creating a new bitmap based on evaluating a predicate (i.e. >>>>>>>> all elements >X). In this case would it make sense to make it a >>>>>>>> multiple of 16, so we can consistently use SIMD instructions for >>>>>>>> the logical "and" operation? >>>>>>>> >>>>>>> >>>>>>> Hmm... interesting thought. I'd have to look but I also recall some >>>>>>> of the newer stuff supporting even wider widths. What do others think? >>>>>>> >>>>>>> >>>>>>>> I think the spec is slightly inconsistent. It says there is 6 >>>>>>>> bytes of overhead per entry but then follows: "with the smallest >>>>>>>> byte width capable of representing the number of types in the union." >>>>>>>> I'm perfectly happy to say it is always 1, always 2, or always >>>>>>>> capped at 2. I agree 32K/64K+ types is a very unlikely scenario. >>>>>>>> We just need to clear up the ambiguity. >>>>>>>> >>>>>>> >>>>>>> Agreed. Do you want to propose an approach & patch to clarify? >>>>>> >>>>>> I can also take responsibility for the ambiguity here. My preference >>>>>> is to use int16_t for the types array (memory suitably aligned), but >>>>>> as 1 byte will be sufficient nearly all of the time, it's a slight >>>>>> trade-off in memory use vs. code complexity, e.g. >>>>>> >>>>>> if (children_.size() < 128) { >>>>>> // types is only 1 byte >>>>>> } else { >>>>>> // types is 2 bytes >>>>>> } >>>>>> >>>>>> Realistically there won't be that many affected code paths, so I'm >>>>>> comfortable with either choice (2-bytes always, or 1 or 2 bytes >>>>>> depending on the size of the union).