> > I wonder how arrow deals with gaps among different implementations? Say, > C++ lib implements some features go lib doesn't support. Is there a > consistent API document, or documents for each language implementation?
It is important to distinguish between two types of functionality: 1. Supporting all the features of the interchange format(s). In this case the canonical document is the format specification [1] 2. Additional functionality for processing arrow data (e.g. query engines, slicing, etc). For 1 we have integration tests [2] and known gaps for some implementation (search for skip.add in datagen.py) which should all have JIRAs associated with them. Some of the implementations (e.g. C# have not been added to the integration tests at all). For 2 the community has not been concerned with keeping feature parity. For instance, the Java library has a substantially different class naming/hierarchy than C++. Also, at least at the moment, no one has expressed interest in implementing a query engine/dataframe library as part of the Arrow project in Java (work has mostly been focused on some performance improvement and some algorithms that contributors have found useful). Hope this helps. -Micah [1] https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst [2] https://github.com/apache/arrow/blob/5ca85922ae90bacb96d939503e53e83e6ec47f8c/dev/archery/archery/integration/datagen.py On Thu, Nov 7, 2019 at 11:25 PM Yibo Cai <yibo....@arm.com> wrote: > Hi Wes, > > On 10/30/19 10:24 PM, Wes McKinney wrote: > > hi Yibo > > > > On Wed, Oct 30, 2019 at 2:16 AM Yibo Cai <yibo....@arm.com> wrote: > >> > >> Hi, > >> > >> I'm new to Arrow. Would like to seek for help about some questions. Any > comment is welcomed. > >> > >> - About source code tree, my understand is that "cpp" is the core arrow > libraries, "c_glib, go, python, ..." are language bindings to ease > integrating arrow into apps developed by that language. Is that correct? > > > > No. We have 6 core implementations: C++, C#, Go, Java, JavaScript, and > Rust > > > > * C/GLib, MATLAB, Python, R bind to C++ > > * Ruby binds to GLib > > > > I wonder how arrow deals with gaps among different implementations? Say, > C++ lib implements some features go lib doesn't support. Is there a > consistent API document, or documents for each language implementation? > > >> - Arrow implements many data types and aggregation functions(sum, mean, > ...). [1] > >> IMO, more functions and types should be supported, like min/max, > vector/tensor operations, big number, etc. I'm not sure if this is in > arrow's scope, or the apps using arrow should deal with it themselves. > > > > Our objective at least in the C++ library is to have a generally > > useful "standard library" that handles common application concerns. > > Whether or not something is thought to be in scope may vary on a case > > by case basis -- if you can't find a JIRA issue for something in > > particular, please go ahead and open one. > > > >> - I see some SIMD optimizations in arrow go binding, such as vectored > sum. [2] > >> But arrow cpp lib doesn't leverage SIMD. [3] > >> Why not optimize it in cpp lib so all languages can benefit? > > > > You're welcome to contribute such optimizations to the C++ library > > > > > > - Wes > > > >> [1] > https://github.com/apache/arrow/tree/master/cpp/src/arrow/compute/kernels > >> [2] > https://github.com/apache/arrow/blob/master/go/arrow/math/float64_avx2_amd64.s > >> [3] > https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sum_internal.h#L99-L111 > >> > >> Yibo >