Kazuaki Ishizaki, Ph.D., Senior Technical Staff Member (STSM), IBM Research - Tokyo ACM Distinguished Member - Apache Spark committer - IBM Academy of Technology Member
Wes McKinney <[email protected]> wrote on 2020/08/26 21:27:49: > From: Wes McKinney <[email protected]> > To: dev <[email protected]>, Micah Kornfield <[email protected]> > Cc: Fan Liya <[email protected]> > Date: 2020/08/26 21:28 > Subject: [EXTERNAL] Re: [DISCUSS] Big Endian support in Arrow (was: > Re: [Java] Supporting Big Endian) > > hi Micah, > > I agree with your reasoning. If supporting BE in some languages (e.g. > Java) is impractical due to performance regressions on LE platforms, > then I don't think it's worth it. But if it can be handled at compile > time or without runtime overhead, and tested / maintained properly on > an ongoing basis, then it seems reasonable to me. It seems that the > number of Arrow stakeholders will only increase from here so I would > hope that there will be more people invested in helping maintain BE in > the future. > > - Wes > > On Tue, Aug 25, 2020 at 11:33 PM Micah Kornfield > <[email protected]> wrote: > > > > I'm expanding the scope of this thread since it looks like work has also > > started for making golang support BigEndian architectures. > > > > I think as a community we should come to a consensus on whether we want to > > support Big Endian architectures in general. I don't think it is a good > > outcome if some implementations accept PRs for Big Endian fixes and some > > don't. > > > > But maybe this is OK with others? > > > > My current opinion on the matter is that we should support it under the > > following conditions: > > > > 1. As long as there is CI in place to catch regressions (right now I think > > the CI is fairly unreliable?) > > 2. No degradation in performance for little-endian architectures (verified > > by additional micro benchmarks) > > 3. Not a large amount of invasive code to distinguish between platforms. > > > > Kazuaki Ishizaki I asked question previously, but could you give some data > > points around: > > 1. The current state of C++ support (how much code needed to change)? > > 2. How many more PRs you expect to need for Java (and approximate size)? > > > > I think this would help myself and others in the decision making process. > > > > Thanks, > > Micah > > > > On Tue, Aug 18, 2020 at 9:15 AM Micah Kornfield <[email protected]> > > wrote: > > > > > My thoughts on the points raised so far: > > > > > > * Does supporting Big Endian increase the reach of Arrow by a lot? > > > > > > Probably not a significant amount, but it does provide one more avenue of > > > adoption. > > > > > > * Does it increase code complexity? > > > > > > Yes. I agree this is a concern. The PR in question did not seem too bad > > > to me but this is subjective. I think the remaining question is how many > > > more places need to be fixed up in the code base and how invasive are the > > > changes. In C++ IIUC it turned out to be a relatively small number of > > > places. > > > > > > Kazuaki Ishizaki have you been able to get the Java implementation working > > > fully locally? How many additional PRs will be needed and what do > > > they look like (I think there already a few more in the queue)? > > > > > > * Will it introduce performance regressions? > > > > > > If done properly I suspect no, but I think if we continue with BigEndian > > > support the places that need to be touched should have benchmarks added to > > > confirm this (including for PRs already merged). > > > > > > Thanks, > > > Micah > > > > > > On Sun, Aug 16, 2020 at 7:37 PM Fan Liya <[email protected]> wrote: > > > > > >> Thank Kazuaki Ishizaki for working on this. > > >> IMO, supporting the big-endian should be a large change, as in many > > >> places of the code base, we have implicitly assumed the little-endian > > >> platform (e.g. > > >> INVALID URI REMOVED > u=https-3A__github.com_apache_arrow_blob_master_java_memory_memory-2Dcore_src_main_java_org_apache_arrow_memory_util_ByteFunctionHelpers.java&d=DwIBaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f- > ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B- > n7ezRb8qyWtog&s=poFSWqjJv99prou53ciinHyBmh5IZlXLlhYvftb9fu4&e= > > >> ). > > >> Supporting the big-endian platform may introduce branches in such places > > >> (or virtual calls) which will affect the performance. > > >> So it would be helpful to evaluate the performance impact. > > >> > > >> Best, > > >> Liya Fan > > >> > > >> > > >> On Sat, Aug 15, 2020 at 7:54 AM Jacques Nadeau <[email protected]> > > >> wrote: > > >> > > >>> Hey Micah, thanks for starting the discussion. > > >>> > > >>> I just skimmed that thread and it isn't entirely clear that there was a > > >>> conclusion that the overhead was worth it. I think everybody agrees that > > >>> it > > >>> would be nice to have the code work on both platforms. On the flipside, > > >>> the > > >>> code noise for a rare case makes the cost-benefit questionable. > > >>> > > >>> In the Java code, we wrote the code to explicitly disallow big endian > > >>> platforms and put preconditions checks in. I definitely think if we want > > >>> to > > >>> support this, it should be done holistically across the code with > > >>> appropriate test plan (both functional and perf). > > >>> > > >>> To me, the question is really about how many use cases are blocked by > > >>> this. > > >>> I'm not sure I've heard anyone say that the limiting factor toleveraging > > >>> Java Arrow was the block on endianess. Keep in mind that until very > > >>> recently, using any Arrow Java code would throw a preconditions check > > >>> before you could even get started on big-endian and I don't think we've > > >>> seen a bunch of messages on that exception. Adding if conditions > > >>> throughout > > >>> the codebase like this patch: [1] isn't exactly awesome and it can also > > >>> risk performance impacts depending on how carefully it is done. > > >>> > > >>> If there isn't a preponderance of evidence of many users beingblocked by > > >>> this capability, I don't think we should accept the code. We already > > >>> have a > > >>> backlog of items that we need to address just ensure existing use cases > > >>> work well. Expanding to new use cases that there is no clear demand for > > >>> will likely just increase code development cost at little benefit. > > >>> > > >>> What do others think? > > >>> > > >>> [1] INVALID URI REMOVED > u=https-3A__github.com_apache_arrow_pull_7923-23issuecomment-2D674311119&d=DwIBaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f- > ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B- > n7ezRb8qyWtog&s=vmvc0b4yHFfWLjLheCRysSiyaeRFO_6p0wdH-sLa7M8&e= > > >>> > > >>> On Fri, Aug 14, 2020 at 4:36 PM Micah Kornfield <[email protected]> > > >>> wrote: > > >>> > > >>> > Kazuaki Ishizak has started working on Big Endian support in Java > > >>> > (including setting up CI for it). Thank you! > > >>> > > > >>> > We previously discussed support for Big Endian architectures in C++ > > >>> [1] and > > >>> > generally agreed that it was a reasonable thing to do. > > >>> > > > >>> > Similar to C++ I think as long as we have a working CI setup it is > > >>> > reasonable for Java to support Big Endian machines. > > >>> > > > >>> > But I think there might be differing opinions so it is worth a > > >>> discussion > > >>> > to see if there are technical blockers or other reasons for not > > >>> supporting > > >>> > Big Endian architectures in the existing java implementation. > > >>> > > > >>> > Thanks, > > >>> > Micah > > >>> > > > >>> > > > >>> > [1] > > >>> > > > >>> > > > >>> INVALID URI REMOVED > u=https-3A__lists.apache.org_thread.html_rcae745f1d848981bb5e8dddacfc4554641aba62e3c949b96bfd8b019-2540-253Cdev.arrow.apache.org-253E&d=DwIBaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f- > ZCGj9Pg&m=3rVsa9EYwGOrvQw8rg0L9EtFs7I7B- > n7ezRb8qyWtog&s=oDBWI9pmI39bTsEieQNDxZit0My21hLIW0fJRPJI0AM&e= > > >>> > > > >>> > > >> >
