RE: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Kazuaki Ishizaki Wed, 26 Aug 2020 06:34:14 -0700

Kazuaki Ishizaki, Ph.D., Senior Technical Staff Member (STSM), IBM 
Research - Tokyo
ACM Distinguished Member - Apache Spark committer - IBM Academy of 
Technology Member


Wes McKinney <[email protected]> wrote on 2020/08/26 21:27:49:

> From: Wes McKinney <[email protected]>
> To: dev <[email protected]>, Micah Kornfield <[email protected]>
> Cc: Fan Liya <[email protected]>
> Date: 2020/08/26 21:28
> Subject: [EXTERNAL] Re: [DISCUSS] Big Endian support in Arrow (was: 
> Re: [Java] Supporting Big Endian)
> 
> hi Micah,
> 
> I agree with your reasoning. If supporting BE in some languages (e.g.
> Java) is impractical due to performance regressions on LE platforms,
> then I don't think it's worth it. But if it can be handled at compile
> time or without runtime overhead, and tested / maintained properly on
> an ongoing basis, then it seems reasonable to me. It seems that the
> number of Arrow stakeholders will only increase from here so I would
> hope that there will be more people invested in helping maintain BE in
> the future.
> 
> - Wes
> 
> On Tue, Aug 25, 2020 at 11:33 PM Micah Kornfield 
> <[email protected]> wrote:
> >
> > I'm expanding the scope of this thread since it looks like work has 
also
> > started for making golang support BigEndian architectures.
> >
> > I think as a community we should come to a consensus on whether we 
want to
> > support Big Endian architectures in general.  I don't think it is a 
good
> > outcome if some implementations accept PRs for Big Endian fixes and 
some
> > don't.
> >
> > But maybe this is OK with others?
> >
> > My current opinion on the matter is that we should support it under 
the
> > following conditions:
> >
> > 1.  As long as there is CI in place to catch regressions (right now I 
think
> > the CI is fairly unreliable?)
> > 2.  No degradation in performance for little-endian architectures 
(verified
> > by additional micro benchmarks)
> > 3.  Not a large amount of invasive code to distinguish between 
platforms.
> >
> > Kazuaki Ishizaki I asked question previously, but could you give some 
data
> > points around:
> > 1.  The current state of C++ support (how much code needed to change)?
> > 2.  How many more PRs you expect to need for Java (and approximate 
size)?
> >
> > I think this would help myself and others in the decision making 
process.
> >
> > Thanks,
> > Micah
> >
> > On Tue, Aug 18, 2020 at 9:15 AM Micah Kornfield 
<[email protected]>
> > wrote:
> >
> > > My thoughts on the points raised so far:
> > >
> > > * Does supporting Big Endian increase the reach of Arrow by a lot?
> > >
> > > Probably not a significant amount, but it does provide one more 
avenue of
> > > adoption.
> > >
> > > * Does it increase code complexity?
> > >
> > > Yes.  I agree this is a concern.  The PR in question did not seem 
too bad
> > > to me but this is subjective.  I think the remaining question is how 
many
> > > more places need to be fixed up in the code base and how invasive 
are the
> > > changes.  In C++ IIUC it turned out to be a relatively small number 
of
> > > places.
> > >
> > > Kazuaki Ishizaki have you been able to get the Java implementation 
working
> > > fully locally?  How many additional PRs will be needed and what do
> > > they look like (I think there already a few more in the queue)?
> > >
> > > * Will it introduce performance regressions?
> > >
> > > If done properly I suspect no, but I think if we continue with 
BigEndian
> > > support the places that need to be touched should have benchmarks 
added to
> > > confirm this (including for PRs already merged).
> > >
> > > Thanks,
> > > Micah
> > >
> > > On Sun, Aug 16, 2020 at 7:37 PM Fan Liya <[email protected]> 
wrote:
> > >
> > >> Thank Kazuaki Ishizaki for working on this.
> > >> IMO, supporting the big-endian should be a large change, as in many
> > >> places of the code base, we have implicitly assumed the 
little-endian
> > >> platform (e.g.
> > >> INVALID URI REMOVED
> 
u=https-3A__github.com_apache_arrow_blob_master_java_memory_memory-2Dcore_src_main_java_org_apache_arrow_memory_util_ByteFunctionHelpers.java&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B-
> n7ezRb8qyWtog&s=poFSWqjJv99prou53ciinHyBmh5IZlXLlhYvftb9fu4&e= 
> > >> ).
> > >> Supporting the big-endian platform may introduce branches in such 
places
> > >> (or virtual calls) which will affect the performance.
> > >> So it would be helpful to evaluate the performance impact.
> > >>
> > >> Best,
> > >> Liya Fan
> > >>
> > >>
> > >> On Sat, Aug 15, 2020 at 7:54 AM Jacques Nadeau <[email protected]>
> > >> wrote:
> > >>
> > >>> Hey Micah, thanks for starting the discussion.
> > >>>
> > >>> I just skimmed that thread and it isn't entirely clear that there 
was a
> > >>> conclusion that the overhead was worth it. I think everybody 
agrees that
> > >>> it
> > >>> would be nice to have the code work on both platforms. On the 
flipside,
> > >>> the
> > >>> code noise for a rare case makes the cost-benefit questionable.
> > >>>
> > >>> In the Java code, we wrote the code to explicitly disallow big 
endian
> > >>> platforms and put preconditions checks in. I definitely think if 
we want
> > >>> to
> > >>> support this, it should be done holistically across the code with
> > >>> appropriate test plan (both functional and perf).
> > >>>
> > >>> To me, the question is really about how many use cases are blocked 
by
> > >>> this.
> > >>> I'm not sure I've heard anyone say that the limiting factor 
toleveraging
> > >>> Java Arrow was the block on endianess. Keep in mind that until 
very
> > >>> recently, using any Arrow Java code would throw a preconditions 
check
> > >>> before you could even get started on big-endian and I don't think 
we've
> > >>> seen a bunch of messages on that exception. Adding if conditions
> > >>> throughout
> > >>> the codebase like this patch: [1] isn't exactly awesome and it can 
also
> > >>> risk performance impacts depending on how carefully it is done.
> > >>>
> > >>> If there isn't a preponderance of evidence of many users 
beingblocked by
> > >>> this capability, I don't think we should accept the code. We 
already
> > >>> have a
> > >>> backlog of items that we need to address just ensure existing use 
cases
> > >>> work well. Expanding to new use cases that there is no clear 
demand for
> > >>> will likely just increase code development cost at little benefit.
> > >>>
> > >>> What do others think?
> > >>>
> > >>> [1] INVALID URI REMOVED
> 
u=https-3A__github.com_apache_arrow_pull_7923-23issuecomment-2D674311119&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B-
> n7ezRb8qyWtog&s=vmvc0b4yHFfWLjLheCRysSiyaeRFO_6p0wdH-sLa7M8&e= 
> > >>>
> > >>> On Fri, Aug 14, 2020 at 4:36 PM Micah Kornfield 
<[email protected]>
> > >>> wrote:
> > >>>
> > >>> > Kazuaki Ishizak has started working on Big Endian support in 
Java
> > >>> > (including setting up CI for it).  Thank you!
> > >>> >
> > >>> > We previously discussed support for Big Endian architectures in 
C++
> > >>> [1] and
> > >>> > generally agreed that it was a reasonable thing to do.
> > >>> >
> > >>> > Similar to C++ I think as long as we have a working CI setup it 
is
> > >>> > reasonable for Java to support Big Endian machines.
> > >>> >
> > >>> > But I think there might be differing opinions so it is worth a
> > >>> discussion
> > >>> > to see if there are technical blockers or other reasons for not
> > >>> supporting
> > >>> > Big Endian architectures in the existing java implementation.
> > >>> >
> > >>> > Thanks,
> > >>> > Micah
> > >>> >
> > >>> >
> > >>> > [1]
> > >>> >
> > >>> >
> > >>> INVALID URI REMOVED
> 
u=https-3A__lists.apache.org_thread.html_rcae745f1d848981bb5e8dddacfc4554641aba62e3c949b96bfd8b019-2540-253Cdev.arrow.apache.org-253E&d=DwIBaQ&c=jf_iaSHvJObTbx-
> siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-
> ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B-
> n7ezRb8qyWtog&s=oDBWI9pmI39bTsEieQNDxZit0My21hLIW0fJRPJI0AM&e= 
> > >>> >
> > >>>
> > >>
>

RE: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

Reply via email to