hi all,

Micah put up a patch for ARROW-104 based on these discussions

https://github.com/apache/arrow/pull/67

Once we review and agree on these changes (the main thing is memory
alignment and padding for reliable SIMD-friendliness) we can push
forward on conforming the implementations.

Big picture, as we've been discussing, it's going to be important to
get integration tests working soon to round-trip data from Java to C++
and vice versa.

thanks
Wes

On Fri, Apr 22, 2016 at 4:04 PM, Micah Kornfield <emkornfi...@gmail.com> wrote:
> https://issues.apache.org/jira/browse/ARROW-104 covers the change to
> the spec.  I should hopefully have a pull request out sometime this
> weekend.  Once it was merged I was going to open up JIRAs for C++ and
> Java code.
>
> On Fri, Apr 22, 2016 at 12:18 PM, Wes McKinney <w...@cloudera.com> wrote:
>> Shall we create a JIRA to codify the alignment settings someplace in
>> the metadata? We may need an umbrella JIRA to include downstream
>> issues tracking code changes to implement and test the default
>> alignment.
>>
>> On Sat, Apr 9, 2016 at 11:34 PM, Micah Kornfield <emkornfi...@gmail.com> 
>> wrote:
>>> Hi Kai,
>>> I think we are in agreement.  For cross machine transfers, alignment
>>> doesn't make a difference but width does.  We could optimize
>>> transfers, by only sending non-padded buffers and then have clients
>>> re-pad the data (but before we optimize we should likely have a
>>> working implementation :)
>>>
>>> The use-case I was thinking of for transferring alignment as part of
>>> the metadata was for the shared-memory IPC.  I just double checked the
>>> current IPC metadata
>>> (https://github.com/apache/arrow/blob/master/format/Message.fbs), and
>>> all the necessary information is already in there to check for
>>> alignment and width.  So passing along the alignment/width
>>> requirements shouldn't be necessary.  We should be able to check
>>> offset and lengths when reading the metadata.
>>>
>>> All of this might be getting into a little bit of premature
>>> optimization though.  I will wait a little bit longer for comments
>>> from others and then update the spec.
>>>
>>> Thanks,
>>> Micah
>>>
>>> On Sat, Apr 9, 2016 at 4:55 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>>>> Hi Micah,
>>>>
>>>> Thanks for your thorough thoughts. The general consideration makes sense 
>>>> to me building Arrow with SIMD support in mind meanwhile not complicating 
>>>> codes too much, to use a fixed value 64 byte by default. We can always 
>>>> improve and optimize this accordingly when have concrete solid algorithms 
>>>> and workloads to benchmark with and collect real performance data, as you 
>>>> said.
>>>>
>>>> One thing I'm not sure about is, whether alignment requirement should be 
>>>> included in IPC metadata, because in my understanding, no buffer address 
>>>> is needed to be passed across machines, so it's up to destination machine 
>>>> to decide how to reallocate the data buffer for receiving the transferred 
>>>> data with whatever alignment address. An alignment address that's good to 
>>>> the source machine but may be not good to the destination.
>>>>
>>>> So in summary, it's good to mention this alignment consideration in the 
>>>> spec, also saying the fixed 64 byte alignment address is used by default; 
>>>> and hard-code the fixed value in source codes (for example, when 
>>>> allocating buffers for primitive arrays, chunk array buffers, and null 
>>>> bitmap buffers).
>>>>
>>>> Please help clarify if I'm not getting you right. Thanks.
>>>>
>>>> Regards,
>>>> Kai
>>>>
>>>> -----Original Message-----
>>>> From: Micah Kornfield [mailto:emkornfi...@gmail.com]
>>>> Sent: Saturday, April 09, 2016 12:56 PM
>>>> To: dev@arrow.apache.org
>>>> Subject: Re: Some questions/proposals for the spec (Layout.md)
>>>>
>>>> Hi Kai,
>>>> Are you proposing making alignment and width part of the RPC metadata?
>>>>   I think this is a good longer term idea, but for simplicity's sake, I 
>>>> think starting with one fixed value is a good idea.
>>>>
>>>> I agree that in the general case guaranteeing alignment is difficult when 
>>>> we have variable width data (e.g. strings) or sliced data 
>>>> (https://issues.apache.org/jira/browse/ARROW-33).  However, I think a 
>>>> fairly common use-case for Arrow will be dealing with fixed width 
>>>> non-nested types (e.g. float, doubles, int32_t) where alignment can be
>>>> guaranteed.   In these cases being able to make use of the optimal CPU
>>>> instruction set is important.
>>>>
>>>> In this regard, one concern with 8 bytes as the default width is that it 
>>>> will cause suboptimal use of current CPUs.  For instance, the Intel 
>>>> Optimization Guide
>>>> (http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf)
>>>> states "An access to data unaligned on 64-byte boundary leads to two 
>>>> memory accesses and requires several μops to be executed (instead of 
>>>> one)." and "A 64-byte or greater data structure or array should be aligned 
>>>> so that its base address is a multiple of 64."
>>>>
>>>> It would be interesting to know the exact performance difference for 
>>>> compiler generated code knowing about different degrees of alignment/width 
>>>> as well as the performance difference using assembly/intrinsics.  In the 
>>>> absence of the performance data, I think defaulting to 64 byte alignment 
>>>> (when the programming language allows for it) based on recommendation from 
>>>> the guide makes sense.  In addition given the existence of 512-bit SIMD, 
>>>> using 64 byte padding for width also makes sense.
>>>>
>>>> Do you have concerns if we make 64 bytes the default instead of 8?
>>>>
>>>> Thanks,
>>>> Micah
>>>>
>>>> On Fri, Apr 8, 2016 at 1:44 PM, Zheng, Kai <kai.zh...@intel.com> wrote:
>>>>> I'm from Intel but not any hardware folks, just would provide my 
>>>>> thoughts. Yes the width and alignment requirement can be very different 
>>>>> according to what version of SIMD is used. And also, sometimes it's hard 
>>>>> to keep the alignment to access specific fields or parts in the even 
>>>>> aligned memory region. It's complex, I thought it's good to mention this 
>>>>> aspect of consideration in the spec but come to the data structures or 
>>>>> format, it can leave to platform specific optimizations regarding to 
>>>>> concrete computing operators and algorithms to use alignment awareness 
>>>>> buffer allocators considering this potential performance impact. A 
>>>>> default value of 8 as mentioned may be used but other values can also be 
>>>>> passed.
>>>>>
>>>>> Regards,
>>>>> Kai
>>>>>
>>>>> -----Original Message-----
>>>>> From: Wes McKinney [mailto:w...@cloudera.com]
>>>>> Sent: Friday, April 08, 2016 11:40 PM
>>>>> To: dev@arrow.apache.org
>>>>> Subject: Re: Some questions/proposals for the spec (Layout.md)
>>>>>
>>>>> On the SIMD question, it seems AVX is going to 512 bits, so one could 
>>>>> even argue for 64-byte alignment as a matter of future-proofing.  AVX2 / 
>>>>> 256-bit seems fairly widely available nowadays, but it would be great if 
>>>>> Todd or any of the hardware folks (e.g. from Intel) on the list could 
>>>>> weigh in with guidance.
>>>>>
>>>>> https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
>>>>>
>>>>> On Fri, Apr 8, 2016 at 8:33 AM, Wes McKinney <w...@cloudera.com> wrote:
>>>>>> On Fri, Apr 8, 2016 at 8:07 AM, Jacques Nadeau <jacq...@apache.org> 
>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> > I believe this choice was primarily about simplifying the code
>>>>>>>> > (similar
>>>>>>>> to why we have a n+1
>>>>>>>> > offsets instead of just n in the list/varchar representations
>>>>>>>> > (even
>>>>>>>> though n=0 is always 0)). In both
>>>>>>>> > situations, you don't have to worry about writing special code
>>>>>>>> > (and a
>>>>>>>> condition) for the boundary
>>>>>>>> > condition inside tight loops (e.g. the last few bytes need to be
>>>>>>>> > handled
>>>>>>>> differently since they
>>>>>>>> > aren't word width).
>>>>>>>>
>>>>>>>> Sounds reasonable.  It might be worth illustrating this with a
>>>>>>>> concrete example.  One scenario that this scheme seems useful for
>>>>>>>> is a creating a new bitmap based on evaluating a predicate (i.e.
>>>>>>>> all elements >X).  In this case would it make sense to make it a
>>>>>>>> multiple of 16, so we can consistently use SIMD instructions for
>>>>>>>> the logical "and" operation?
>>>>>>>>
>>>>>>>
>>>>>>> Hmm... interesting thought. I'd have to look but I also recall some
>>>>>>> of the newer stuff supporting even wider widths. What do others think?
>>>>>>>
>>>>>>>
>>>>>>>> I think the spec is slightly inconsistent.  It says there is 6
>>>>>>>> bytes of overhead per entry but then follows: "with the smallest
>>>>>>>> byte width capable of representing the number of types in the union."
>>>>>>>> I'm perfectly happy to say it is always 1, always 2, or always
>>>>>>>> capped at 2.  I agree 32K/64K+ types is a very unlikely scenario.
>>>>>>>> We just need to clear up the ambiguity.
>>>>>>>>
>>>>>>>
>>>>>>> Agreed. Do you want to propose an approach & patch to clarify?
>>>>>>
>>>>>> I can also take responsibility for the ambiguity here. My preference
>>>>>> is to use int16_t for the types array (memory suitably aligned), but
>>>>>> as 1 byte will be sufficient nearly all of the time, it's a slight
>>>>>> trade-off in memory use vs. code complexity, e.g.
>>>>>>
>>>>>> if (children_.size() < 128) {
>>>>>>   // types is only 1 byte
>>>>>> } else {
>>>>>>   // types is 2 bytes
>>>>>> }
>>>>>>
>>>>>> Realistically there won't be that many affected code paths, so I'm
>>>>>> comfortable with either choice (2-bytes always, or 1 or 2 bytes
>>>>>> depending on the size of the union).

Reply via email to