[RESULT][VOTE] Adopt ADBC database client connectivity specification
My vote: +1 The vote passes with 4 binding +1 votes, 8 non-binding +1 votes, and no -1 votes. Thanks all! I will update the RFC PR and merge it next, and will continue setting up the CI and release process. -David On Wed, Oct 5, 2022, at 15:38, David Li wrote: > Kirill (CC'd) mentioned the organization could be improved [1] - I've > put up a PR to move the definitions around to make it easier for > implementors [2]. > > I'll leave this thread open for a little while longer for any > interested parties/see if anyone has PR comments before I'll merge the > PR/vote/close/etc. Thanks all! > > [1]: https://github.com/apache/arrow-adbc/issues/145 > [2]: https://github.com/apache/arrow-adbc/pull/148 > > -David > > On Wed, Oct 5, 2022, at 15:27, Neal Richardson wrote: >> +1 >> >> (I think this makes 4 binding +1s, if I count correctly) >> >> On Wed, Oct 5, 2022 at 11:30 AM Antoine Pitrou wrote: >> >>> >>> +1 (binding), with the caveat that I looked mostly at the C API. >>> >>> Regards >>> >>> Antoine. >>> >>> >>> Le 21/09/2022 à 17:40, David Li a écrit : >>> > Hello, >>> > >>> > We have been discussing [1] standard interfaces for Arrow-based database >>> access and have been working on implementations of the proposed interfaces >>> [2], all under the name "ADBC". This proposal aims to provide a unified >>> client abstraction across Arrow-native database protocols (like Flight SQL) >>> and non-Arrow database protocols, which can then be used by Arrow projects >>> like Dataset/Acero and ecosystem projects like Ibis. >>> > >>> > For details, see the RFC here: >>> https://github.com/apache/arrow/pull/14079 >>> > >>> > I would like to propose that the Arrow project adopt this RFC, along >>> with apache/arrow-adbc commit 7866a56 [3], as version 1.0.0 of the ADBC API >>> standard. >>> > >>> > Please vote to adopt the specification as described above. (This is not >>> a vote to release any components.) >>> > >>> > This vote will be open for at least 72 hours. >>> > >>> > [ ] +1 Adopt the ADBC specification >>> > [ ] 0 >>> > [ ] -1 Do not adopt the specification because... >>> > >>> > Thanks to the DuckDB and R DBI projects for providing feedback on and >>> implementations of the proposal. >>> > >>> > [1]: https://lists.apache.org/thread/cq7t9s5p7dw4vschylhwsfgqwkr5fmf2 >>> > [2]: https://github.com/apache/arrow-adbc >>> > [3]: >>> https://github.com/apache/arrow-adbc/commit/7866a566f5b7b635267bfb7a87ea49b01dfe89fa >>> > >>> > Thank you, >>> > David >>>
Re: [DISCUSS] Python Wheel Size
We've discussed this in the past, I think. In addition to having many optional components enabled, the pyarrow wheel also includes the unit tests directory which is of growing size. I think if we made a pyarrow-slim wheel with support only for core Arrow (IPC, etc.) and Parquet file reading, it might be possible to trim by significant percentage. Rusty -- if you would like to push this forward I would suggest creating an alternative wheel build script to the one that we use and modify flags / add other customizations (e.g. trimming unit tests) that produce a wheel that we could build and possibly upload as "pyarrow-slim" on PyPI On Mon, Oct 3, 2022 at 8:55 AM Antoine Pitrou wrote: > > > Hi Rusty, > > Le 02/10/2022 à 22:51, Rusty Conover a écrit : > > Hi Arrow Team, > > > > I'm using Apache Arrow with AWS Lambda Functions. > > > > The primary motivation is AWS Athena's user-defined functions[1]. Those > > functions process and return Arrow IPC segments. > > > > * The published Python wheels for Apache Arrow include almost every feature > > of Arrow. (Gandiva, Plasma, Flight) > > Gandiva isn't compiled in the Python wheels. Plasma is reasonably small > (but is also being deprecated soon). Flight is more sizable. However, > most of the size seems to be in Arrow itself and Parquet. A large part > of the size is probably attributable to the Arrow compute engine and > functions, and also perhaps to filesystem implementations such as S3 and > GCS (due to the large third-party dependencies that they bundle). > > > Would it be possible to create a new Python package (i.e., "pyarrow-slim") > > that would disable some of the functionality but result in smaller python > > wheels? > > Perhaps. The first step would be to allow disabling more components in > PyArrow, though. Otherwise I'm afraid the size reduction wouldn't be > terrific. > > Regards > > Antoine.
[RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 13.0.0 RC1
With 9 +1 votes (4 binding), the vote passes. I have published artifacts to crates.io Thank you for helping with verifying the release. On Sat, Oct 8, 2022 at 8:51 PM Kun Liu wrote: > +1 (non-binding) > > Do the validation on Intel Mac > > Ashish 于2022年10月9日周日 00:22写道: > > > +1 (non-binding) > > > > validated on M1 Mac > > > > On Fri, Oct 7, 2022 at 4:26 AM Andy Grove wrote: > > > > > Hi, > > > > > > I would like to propose a release of Apache Arrow DataFusion > > > Implementation, > > > version 13.0.0. > > > > > > This release candidate is based on commit: > > > 807a0c1d2963f6ca327d316badb4ed0fa77e9f21 [1] > > > The proposed release tarball and signatures are hosted at [2]. > > > The changelog is located at [3]. > > > > > > Please download, verify checksums and signatures, run the unit tests, > and > > > vote > > > on the release. The vote will be open for at least 72 hours. > > > > > > Only votes from PMC members are binding, but all members of the > community > > > are > > > encouraged to test the release and vote with "(non-binding)". > > > > > > The standard verification procedure is documented at > > > > > > > > > https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates > > > . > > > > > > [ ] +1 Release this as Apache Arrow DataFusion 13.0.0 > > > [ ] +0 > > > [ ] -1 Do not release this as Apache Arrow DataFusion 13.0.0 because... > > > > > > Here is my vote: > > > > > > +1 > > > > > > [1]: > > > > > > > > > https://github.com/apache/arrow-datafusion/tree/807a0c1d2963f6ca327d316badb4ed0fa77e9f21 > > > [2]: > > > > > > > > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-13.0.0-rc1 > > > [3]: > > > > > > > > > https://github.com/apache/arrow-datafusion/blob/807a0c1d2963f6ca327d316badb4ed0fa77e9f21/CHANGELOG.md > > > > > > > > > -- > > thanks > > ashish > > >
Build Frustrations
If this isn't the right place to ask, please let me know where to direct this question: I'm trying to build a slightly modified version of the 9.0.0 tag for arrow (particularly Pyarrow). Walking through the build steps (below), I have run into a problem where the arrow c++ library has symbols that include the tag Bcxx11, but the cython libraries aren't generating that flag. The symbol resolution fails at runtime once the library loads. Particularly these two symbols in python/build/lib.linux-x86_64-3.8/pyarrow/ lib.cpython-38-x86_64-linux-gnu.so U _ZNK5arrow8DataType18ComputeFingerprintEv U _ZNK5arrow8DataType26ComputeMetadataFingerprintEv Don't resolve. All of the corresponding symbols in libarrow_so.900 look like this (not the addition of the B5cxx11Ev): 006addc0 T _ZNK5arrow8DataType18ComputeFingerprintB5cxx11Ev 006ae050 T _ZNK5arrow8DataType26ComputeMetadataFingerprintB5cxx11Ev I've tried building with explicit flags to encourage the libraries to include the cxx11 symbol (in python/CMakeLists.txt). That doesn't seem to impact this issue: set (CMAKE_CXX_STANDARD 11) set (CMAKE_CXX_STANDARD_REQUIRED ON) set (CMAKE_CXX_EXTENSIONS OFF) I also added export PYARROW_CXXFLAGS="-std=c++11" for the wheel build of pyarrow (no effect). I would appreciate any guidance on how to build a clean wheel file of pyarrow that includes my small code changes. Best, -Joe -- Here's what I'm using: gcc version 9.4.0 g++ version 9.4.0 cmake version 3.16.3 CMake command for cpp/build: cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON -DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON -DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED .. Setup and build for python: export PYARROW_WITH_DATASET=1 export PYARROW_WITH_BZ2=1 export PYARROW_WITH_ZLIB=1 export PYARROW_WITH_ZSTD=1 export PYARROW_WITH_LZ4=1 export PYARROW_WITH_SNAPPY=1 export PYARROW_WITH_BROTLI=1 export PYARROW_WITH_PARQUET=1 export PYARROW_WITH_PARQUET_ENCRYPTION-1 export PYARROW_WITH_PARQUET_ENCRYPTION=1 export PYARROW_WITH_CUDA=1 export PYARROW_WITH_FLIGHT=1 export PYARROW_WITH_GANDIVA=1 export PYARROW_WITH_PLASMA=1 export PYARROW_WITH_S3=1 export PYARROW_WITH_TENSORFLOW=1 export PYARROW_WITH_CSV=1 export PYARROW_WITH_JSON=1 export_PYARROW_WITH_RE2=1 export PYARROW_WITH_IPC=1 export PYARROW_PARALLEL=4 export PYARROW_CXXFLAGS="-std=c++11" python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel
Re: Build Frustrations
Le 10/10/2022 à 19:27, Joseph Porter a écrit : I've tried building with explicit flags to encourage the libraries to include the cxx11 symbol (in python/CMakeLists.txt). That doesn't seem to impact this issue: set (CMAKE_CXX_STANDARD 11) set (CMAKE_CXX_STANDARD_REQUIRED ON) set (CMAKE_CXX_EXTENSIONS OFF) I also added export PYARROW_CXXFLAGS="-std=c++11" for the wheel build of pyarrow (no effect). Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?
Re: Build Frustrations
Hi Antoine, Here's what I did: export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1" Here's what I got: ImportError: /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/ lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK5arrow8DataType18ComputeFingerprintEv It looks like the symbol mismatch exists already in the libraries that were created by the C++ build step, which is why I tried to add the c++11 directives to the CMakeLists.txt in the python module. -Joe On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou wrote: > > Le 10/10/2022 à 19:27, Joseph Porter a écrit : > > > > I've tried building with explicit flags to encourage the libraries to > > include the cxx11 symbol (in python/CMakeLists.txt). That doesn't seem > to > > impact this issue: > > > > set (CMAKE_CXX_STANDARD 11) > > set (CMAKE_CXX_STANDARD_REQUIRED ON) > > set (CMAKE_CXX_EXTENSIONS OFF) > > > > I also added > > export PYARROW_CXXFLAGS="-std=c++11" > > for the wheel build of pyarrow (no effect). > > Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags? >
Re: Build Frustrations
Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++ libraries? Le 10/10/2022 à 20:20, Joseph Porter a écrit : Hi Antoine, Here's what I did: export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1" Here's what I got: ImportError: /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/ lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK5arrow8DataType18ComputeFingerprintEv It looks like the symbol mismatch exists already in the libraries that were created by the C++ build step, which is why I tried to add the c++11 directives to the CMakeLists.txt in the python module. -Joe On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou wrote: Le 10/10/2022 à 19:27, Joseph Porter a écrit : I've tried building with explicit flags to encourage the libraries to include the cxx11 symbol (in python/CMakeLists.txt). That doesn't seem to impact this issue: set (CMAKE_CXX_STANDARD 11) set (CMAKE_CXX_STANDARD_REQUIRED ON) set (CMAKE_CXX_EXTENSIONS OFF) I also added export PYARROW_CXXFLAGS="-std=c++11" for the wheel build of pyarrow (no effect). Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?
Re: Parser for expressions
Yes that makes a lot of sense! I’d agree that it would probably be fine to have two different syntaxes, seeing as the use-cases are a bit different. Did anyone else have any thoughts? Either on the lisp-style syntax for Arrow’s Expressions or on having two different syntaxes? (Weston or Antoine?) Sasha > On Oct 9, 2022, at 5:38 AM, Jin Shang wrote: > > Hi Sasha, > > I agree with your points. However Gandiva is kind of specialized in computing > arithmetic expressions and it offers little to none non-arithmetic > operations. So it is very helpful if its parser understands natural math > expressions. > > Considering that Gandiva is a relatively independent component within the > arrow project, and that it’s only a math expression compiler rather than a > fully functioned compute engine, maybe it’s acceptable for Gandiva to have > its own grammar different from compute/Acero/Substrait etc. > > Best, > Jin > >> 2022年10月8日 03:01,Sasha Krassovsky 写道: >> >> Hi Jin, >> I agree it would be good to standardize on a syntax. To me, the advantages >> of the lisp-style syntax are: >> - don’t have to define/implement any kind of precedence rules >> - has a uniform syntax (no distinction between prefix and infix operators) >> - avoids having “special” functions that have an associated arithmetic >> symbol >> - translates directly to the underlying Expression infrastructure. >> >> The advantage of the Python-style syntax is that it’s more natural to use >> for arithmetic expressions. However, I think for non-arithmetic expressions >> this syntax would be more cumbersome. >> >> Either would work of course, I guess it just depends on the goal. I was >> thinking the string representation wouldn’t represent any significant level >> of abstraction, it is just a convenience to save on clutter when typing out >> expressions. >> >> Sasha >> >>> 6 окт. 2022 г., в 22:20, Jin Shang написал(а): >>> >>> Hi Sasha and Weston, >>> >>> I'm the author of the mentioned Gandiva parser. I agree that having one >>> unified syntax is ideal. I think one critical divergence between Sasha's >>> and my proposals is that mine is with C++/Python imperative style (foo(x, >>> y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a >>> b)…). I feel like it'll be better for us to settle on one of the styles >>> before we start implementing the parsers. >>> >>> Best, >>> Jin >>> On Friday, October 7, 2022, Sasha Krassovsky wrote: Hi Weston, I’d be happy to donate something like this to Sunstrait if that’s useful, I was thinking of proving out a design here before going there. However we could also just go straight there :) Regarding infix operators and such the edge case I was thinking of is that a user could potentially add a kernel to the registry called e.g. “+”. Would the parser implicitly convert any instances of “+” to “add” and break that? Implicit typing for literals and parameters can probably also be added without issues to the current scheme. Would the parameters be passed as an std::unordered_map? > Does a field_ref have to be a field name or can it be a field index? It can be a field index or even a field path. The field ref is parsed using FieldRef::FromDotPath ([1] in my original message), which can express any FieldRef. Sasha >> 6 окт. 2022 г., в 16:08, Weston Pace написал(а): > > Currently Substrait only has a binary (protobuf) serialization (and a > protobuf JSON one but that's not really human writable and barely > human readable). Substrait does not have a text serialization. I > believe there is some desire for one (maybe Sasha wants to give it a > try?). A text format for Substrait would solve this problem because > you could go "text expression" -> "substrait expression" -> "arrow > expression". > > Since no text format exists for Substrait I think that Substrait does > not currently solve this problem or overlap with your work. However, > at some point (hopefully), it will. > > There was also a fairly recent proposal for a parser for gandiva expressions[1]. > > Compared with [1] I think this proposal is simpler to parse but lacks > some of the shortcut conveniences (e.g. implicit types for literals, > support for common infix operators (+, -, /, ...)). > > Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I think > would be useful to have as one could then do something like `auto > arrow_expr = Parse(my_expr, threshold)`. > > Does a field_ref have to be a field name or can it be a field index? > The latter is quite useful when the schema has duplicate field names. > > I'm +0.5 on this change. I worry a bit about having (eventually) > three different syntaxes. However, at the moment we h
Re: Parser for expressions
I don't see the point of having two different syntaxes. Also, IMHO lisp-style is harder for many people, so I would rather a more "traditional" syntax (though Lisp is historically traditional, of course ;-)). Le 10/10/2022 à 21:10, Sasha Krassovsky a écrit : Yes that makes a lot of sense! I’d agree that it would probably be fine to have two different syntaxes, seeing as the use-cases are a bit different. Did anyone else have any thoughts? Either on the lisp-style syntax for Arrow’s Expressions or on having two different syntaxes? (Weston or Antoine?) Sasha On Oct 9, 2022, at 5:38 AM, Jin Shang wrote: Hi Sasha, I agree with your points. However Gandiva is kind of specialized in computing arithmetic expressions and it offers little to none non-arithmetic operations. So it is very helpful if its parser understands natural math expressions. Considering that Gandiva is a relatively independent component within the arrow project, and that it’s only a math expression compiler rather than a fully functioned compute engine, maybe it’s acceptable for Gandiva to have its own grammar different from compute/Acero/Substrait etc. Best, Jin 2022年10月8日 03:01,Sasha Krassovsky 写道: Hi Jin, I agree it would be good to standardize on a syntax. To me, the advantages of the lisp-style syntax are: - don’t have to define/implement any kind of precedence rules - has a uniform syntax (no distinction between prefix and infix operators) - avoids having “special” functions that have an associated arithmetic symbol - translates directly to the underlying Expression infrastructure. The advantage of the Python-style syntax is that it’s more natural to use for arithmetic expressions. However, I think for non-arithmetic expressions this syntax would be more cumbersome. Either would work of course, I guess it just depends on the goal. I was thinking the string representation wouldn’t represent any significant level of abstraction, it is just a convenience to save on clutter when typing out expressions. Sasha 6 окт. 2022 г., в 22:20, Jin Shang написал(а): Hi Sasha and Weston, I'm the author of the mentioned Gandiva parser. I agree that having one unified syntax is ideal. I think one critical divergence between Sasha's and my proposals is that mine is with C++/Python imperative style (foo(x, y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a b)…). I feel like it'll be better for us to settle on one of the styles before we start implementing the parsers. Best, Jin On Friday, October 7, 2022, Sasha Krassovsky wrote: Hi Weston, I’d be happy to donate something like this to Sunstrait if that’s useful, I was thinking of proving out a design here before going there. However we could also just go straight there :) Regarding infix operators and such the edge case I was thinking of is that a user could potentially add a kernel to the registry called e.g. “+”. Would the parser implicitly convert any instances of “+” to “add” and break that? Implicit typing for literals and parameters can probably also be added without issues to the current scheme. Would the parameters be passed as an std::unordered_map? Does a field_ref have to be a field name or can it be a field index? It can be a field index or even a field path. The field ref is parsed using FieldRef::FromDotPath ([1] in my original message), which can express any FieldRef. Sasha 6 окт. 2022 г., в 16:08, Weston Pace написал(а): Currently Substrait only has a binary (protobuf) serialization (and a protobuf JSON one but that's not really human writable and barely human readable). Substrait does not have a text serialization. I believe there is some desire for one (maybe Sasha wants to give it a try?). A text format for Substrait would solve this problem because you could go "text expression" -> "substrait expression" -> "arrow expression". Since no text format exists for Substrait I think that Substrait does not currently solve this problem or overlap with your work. However, at some point (hopefully), it will. There was also a fairly recent proposal for a parser for gandiva expressions[1]. Compared with [1] I think this proposal is simpler to parse but lacks some of the shortcut conveniences (e.g. implicit types for literals, support for common infix operators (+, -, /, ...)). Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I think would be useful to have as one could then do something like `auto arrow_expr = Parse(my_expr, threshold)`. Does a field_ref have to be a field name or can it be a field index? The latter is quite useful when the schema has duplicate field names. I'm +0.5 on this change. I worry a bit about having (eventually) three different syntaxes. However, at the moment we have zero. [1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky wrote: Hi David, Could you elaborate on which part of my proposa
Re: Build Frustrations
No go. I still get the B5cxx11 extension on the symbols in the compiled libarrow library. Tried: /workspace/arrow/pyarrow-dev/bin/cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON -DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON -DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED -D_GLIBCXX_USE_CXX11_ABI=0 .. For my next attempt, I modified CMakeLists.txt (in arrow/cpp) to explicitly remove any mention of -std=c++11 in the CXXFLAGS. I'll let you know if that works. Any pointers on better ways to troubleshoot this would also be helpful. I'm not super-conversant with CMake. # Remove --std=c++11 to avoid errors from C compilers string(REPLACE "-std=c++11" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS}) # Add C++-only flags, like -std=c++11 set(CMAKE_CXX_FLAGS "${CXX_ONLY_FLAGS} ${CMAKE_CXX_FLAGS}") string(REPLACE "-std=c++11" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}) On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou wrote: > > Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++ > libraries? > > > Le 10/10/2022 à 20:20, Joseph Porter a écrit : > > Hi Antoine, > > > > Here's what I did: > > export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1" > > > > Here's what I got: > > ImportError: > > /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/ > > lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: > > _ZNK5arrow8DataType18ComputeFingerprintEv > > > > It looks like the symbol mismatch exists already in the libraries that > were > > created by the C++ build step, which is why I tried to add the c++11 > > directives to the CMakeLists.txt in the python module. > > > > -Joe > > > > On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou > wrote: > > > >> > >> Le 10/10/2022 à 19:27, Joseph Porter a écrit : > >>> > >>> I've tried building with explicit flags to encourage the libraries to > >>> include the cxx11 symbol (in python/CMakeLists.txt). That doesn't seem > >> to > >>> impact this issue: > >>> > >>> set (CMAKE_CXX_STANDARD 11) > >>> set (CMAKE_CXX_STANDARD_REQUIRED ON) > >>> set (CMAKE_CXX_EXTENSIONS OFF) > >>> > >>> I also added > >>> export PYARROW_CXXFLAGS="-std=c++11" > >>> for the wheel build of pyarrow (no effect). > >> > >> Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags? > >> > > >
Re: Build Frustrations
Le 10/10/2022 à 21:31, Joseph Porter a écrit : No go. I still get the B5cxx11 extension on the symbols in the compiled libarrow library. "-D_GLIBCXX_USE_CXX11_ABI=0" is a C++ compiler flag, not a CMake flag. One possibility is to pass instead "-DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0" to CMake. Tried: /workspace/arrow/pyarrow-dev/bin/cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON -DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON -DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED -D_GLIBCXX_USE_CXX11_ABI=0 .. For my next attempt, I modified CMakeLists.txt (in arrow/cpp) to explicitly remove any mention of -std=c++11 in the CXXFLAGS. I'll let you know if that works. Any pointers on better ways to troubleshoot this would also be helpful. I'm not super-conversant with CMake. # Remove --std=c++11 to avoid errors from C compilers string(REPLACE "-std=c++11" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS}) # Add C++-only flags, like -std=c++11 set(CMAKE_CXX_FLAGS "${CXX_ONLY_FLAGS} ${CMAKE_CXX_FLAGS}") string(REPLACE "-std=c++11" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}) On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou wrote: Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++ libraries? Le 10/10/2022 à 20:20, Joseph Porter a écrit : Hi Antoine, Here's what I did: export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1" Here's what I got: ImportError: /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/ lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK5arrow8DataType18ComputeFingerprintEv It looks like the symbol mismatch exists already in the libraries that were created by the C++ build step, which is why I tried to add the c++11 directives to the CMakeLists.txt in the python module. -Joe On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou wrote: Le 10/10/2022 à 19:27, Joseph Porter a écrit : I've tried building with explicit flags to encourage the libraries to include the cxx11 symbol (in python/CMakeLists.txt). That doesn't seem to impact this issue: set (CMAKE_CXX_STANDARD 11) set (CMAKE_CXX_STANDARD_REQUIRED ON) set (CMAKE_CXX_EXTENSIONS OFF) I also added export PYARROW_CXXFLAGS="-std=c++11" for the wheel build of pyarrow (no effect). Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?
Re: Build Frustrations
That makes sense. I'll give it a try - thanks! On Mon, Oct 10, 2022 at 2:34 PM Antoine Pitrou wrote: > > > Le 10/10/2022 à 21:31, Joseph Porter a écrit : > > No go. I still get the B5cxx11 extension on the symbols in the compiled > > libarrow library. > > "-D_GLIBCXX_USE_CXX11_ABI=0" is a C++ compiler flag, not a CMake flag. > > One possibility is to pass instead > "-DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0" to CMake. > > > > > > Tried: > > /workspace/arrow/pyarrow-dev/bin/cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME > > -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON > > -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON > > -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON > > -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON > > -DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON > -DARROW_GANDIVA=ON > > -DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON > > -DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON > > -DARROW_DEPENDENCY_SOURCE=BUNDLED -D_GLIBCXX_USE_CXX11_ABI=0 .. > > > > For my next attempt, I modified CMakeLists.txt (in arrow/cpp) to > explicitly > > remove any mention of -std=c++11 in the CXXFLAGS. I'll let you know if > > that works. Any pointers on better ways to troubleshoot this would also > be > > helpful. I'm not super-conversant with CMake. > > > > # Remove --std=c++11 to avoid errors from C compilers > > string(REPLACE "-std=c++11" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS}) > > > > # Add C++-only flags, like -std=c++11 > > set(CMAKE_CXX_FLAGS "${CXX_ONLY_FLAGS} ${CMAKE_CXX_FLAGS}") > > string(REPLACE "-std=c++11" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}) > > > > On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou > wrote: > > > >> > >> Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++ > >> libraries? > >> > >> > >> Le 10/10/2022 à 20:20, Joseph Porter a écrit : > >>> Hi Antoine, > >>> > >>> Here's what I did: > >>> export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1" > >>> > >>> Here's what I got: > >>> ImportError: > >>> /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/ > >>> lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: > >>> _ZNK5arrow8DataType18ComputeFingerprintEv > >>> > >>> It looks like the symbol mismatch exists already in the libraries that > >> were > >>> created by the C++ build step, which is why I tried to add the c++11 > >>> directives to the CMakeLists.txt in the python module. > >>> > >>> -Joe > >>> > >>> On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou > >> wrote: > >>> > > Le 10/10/2022 à 19:27, Joseph Porter a écrit : > > > > I've tried building with explicit flags to encourage the libraries to > > include the cxx11 symbol (in python/CMakeLists.txt). That doesn't > seem > to > > impact this issue: > > > > set (CMAKE_CXX_STANDARD 11) > > set (CMAKE_CXX_STANDARD_REQUIRED ON) > > set (CMAKE_CXX_EXTENSIONS OFF) > > > > I also added > > export PYARROW_CXXFLAGS="-std=c++11" > > for the wheel build of pyarrow (no effect). > > Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags? > > >>> > >> > > >
Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.5.1 RC1
With 9 +1 (4 binding) the release is approved. The release is available here: https://dist.apache.org/repos/dist/release/arrow/arrow-object-store-rs-0.5.1 I have also released it to crates.io: https://crates.io/crates/object_store/0.5.1 Thank you all for voting and reviewing the release Andrew On Sat, Oct 8, 2022 at 12:06 PM Ashish wrote: > +1 (non-binding) > > verified on M1 Mac > > On Fri, Oct 7, 2022 at 8:32 AM Andrew Lamb wrote: > > > Hi, > > > > I would like to propose a release of Apache Arrow Rust Object > > Store Implementation, version 0.5.1. > > > > This release candidate is based on commit: > > 8a54e95850fe27ac5865a02ef4be2de0937de5b3 [1] > > > > The proposed release tarball and signatures are hosted at [2]. > > > > The changelog is located at [3]. > > > > Please download, verify checksums and signatures, run the unit tests, > > and vote on the release. There is a script [4] that automates some of > > the verification. > > > > The vote will be open for at least 72 hours. > > > > [ ] +1 Release this as Apache Arrow Rust Object Store > > [ ] +0 > > [ ] -1 Do not release this as Apache Arrow Rust Object Store because... > > > > [1]: > > > > > https://github.com/apache/arrow-rs/tree/8a54e95850fe27ac5865a02ef4be2de0937de5b3 > > [2]: > > > > > https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.5.1-rc1 > > [3]: > > > > > https://github.com/apache/arrow-rs/blob/8a54e95850fe27ac5865a02ef4be2de0937de5b3/object_store/CHANGELOG.md > > [4]: > > > > > https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh > > > > > -- > thanks > ashish >
Re: [RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 13.0.0 RC1
Hi Andy, Could you add "adding a release to https://reporter.apache.org/addrelease.html?arrow " to release process of DataFusion and Ballista? The release information is used to generate a template for a board report. FYI: * Board Report Wizard: https://reporter.apache.org/wizard/?arrow * A board report draft based on template generated by the Board Report Wizard: https://github.com/apache/arrow/pull/14357 Thanks, -- kou In "[RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 13.0.0 RC1" on Mon, 10 Oct 2022 06:59:24 -0600, Andy Grove wrote: > With 9 +1 votes (4 binding), the vote passes. I have published artifacts to > crates.io > > Thank you for helping with verifying the release. > > > On Sat, Oct 8, 2022 at 8:51 PM Kun Liu wrote: > >> +1 (non-binding) >> >> Do the validation on Intel Mac >> >> Ashish 于2022年10月9日周日 00:22写道: >> >> > +1 (non-binding) >> > >> > validated on M1 Mac >> > >> > On Fri, Oct 7, 2022 at 4:26 AM Andy Grove wrote: >> > >> > > Hi, >> > > >> > > I would like to propose a release of Apache Arrow DataFusion >> > > Implementation, >> > > version 13.0.0. >> > > >> > > This release candidate is based on commit: >> > > 807a0c1d2963f6ca327d316badb4ed0fa77e9f21 [1] >> > > The proposed release tarball and signatures are hosted at [2]. >> > > The changelog is located at [3]. >> > > >> > > Please download, verify checksums and signatures, run the unit tests, >> and >> > > vote >> > > on the release. The vote will be open for at least 72 hours. >> > > >> > > Only votes from PMC members are binding, but all members of the >> community >> > > are >> > > encouraged to test the release and vote with "(non-binding)". >> > > >> > > The standard verification procedure is documented at >> > > >> > > >> > >> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates >> > > . >> > > >> > > [ ] +1 Release this as Apache Arrow DataFusion 13.0.0 >> > > [ ] +0 >> > > [ ] -1 Do not release this as Apache Arrow DataFusion 13.0.0 because... >> > > >> > > Here is my vote: >> > > >> > > +1 >> > > >> > > [1]: >> > > >> > > >> > >> https://github.com/apache/arrow-datafusion/tree/807a0c1d2963f6ca327d316badb4ed0fa77e9f21 >> > > [2]: >> > > >> > > >> > >> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-13.0.0-rc1 >> > > [3]: >> > > >> > > >> > >> https://github.com/apache/arrow-datafusion/blob/807a0c1d2963f6ca327d316badb4ed0fa77e9f21/CHANGELOG.md >> > > >> > >> > >> > -- >> > thanks >> > ashish >> > >>
Re: Build Frustrations
I think I found a culprit in the pyarrow cmake config: if(PYARROW_USE_TENSORFLOW) # TensorFlow uses the old GLIBCXX ABI, so we have to use it too set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0") endif() This was overriding my attempts to add c++11 to the pyarrow build (which wasn't a good idea, apparently). Still working on the build per your suggestion. The gandiva test executable build chokes if c++11 isn't available, so I turned off the test build. On Mon, Oct 10, 2022 at 3:47 PM Joseph Porter wrote: > That makes sense. I'll give it a try - thanks! > > On Mon, Oct 10, 2022 at 2:34 PM Antoine Pitrou wrote: > >> >> >> Le 10/10/2022 à 21:31, Joseph Porter a écrit : >> > No go. I still get the B5cxx11 extension on the symbols in the compiled >> > libarrow library. >> >> "-D_GLIBCXX_USE_CXX11_ABI=0" is a C++ compiler flag, not a CMake flag. >> >> One possibility is to pass instead >> "-DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0" to CMake. >> >> >> > >> > Tried: >> > /workspace/arrow/pyarrow-dev/bin/cmake >> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME >> > -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON >> > -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON >> > -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON >> > -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON >> > -DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON >> -DARROW_GANDIVA=ON >> > -DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON >> > -DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON >> > -DARROW_DEPENDENCY_SOURCE=BUNDLED -D_GLIBCXX_USE_CXX11_ABI=0 .. >> > >> > For my next attempt, I modified CMakeLists.txt (in arrow/cpp) to >> explicitly >> > remove any mention of -std=c++11 in the CXXFLAGS. I'll let you know if >> > that works. Any pointers on better ways to troubleshoot this would >> also be >> > helpful. I'm not super-conversant with CMake. >> > >> > # Remove --std=c++11 to avoid errors from C compilers >> > string(REPLACE "-std=c++11" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS}) >> > >> > # Add C++-only flags, like -std=c++11 >> > set(CMAKE_CXX_FLAGS "${CXX_ONLY_FLAGS} ${CMAKE_CXX_FLAGS}") >> > string(REPLACE "-std=c++11" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}) >> > >> > On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou >> wrote: >> > >> >> >> >> Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++ >> >> libraries? >> >> >> >> >> >> Le 10/10/2022 à 20:20, Joseph Porter a écrit : >> >>> Hi Antoine, >> >>> >> >>> Here's what I did: >> >>> export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1" >> >>> >> >>> Here's what I got: >> >>> ImportError: >> >>> /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/ >> >>> lib.cpython-38-x86_64-linux-gnu.so: undefined symbol: >> >>> _ZNK5arrow8DataType18ComputeFingerprintEv >> >>> >> >>> It looks like the symbol mismatch exists already in the libraries that >> >> were >> >>> created by the C++ build step, which is why I tried to add the c++11 >> >>> directives to the CMakeLists.txt in the python module. >> >>> >> >>> -Joe >> >>> >> >>> On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou >> >> wrote: >> >>> >> >> Le 10/10/2022 à 19:27, Joseph Porter a écrit : >> > >> > I've tried building with explicit flags to encourage the libraries >> to >> > include the cxx11 symbol (in python/CMakeLists.txt). That doesn't >> seem >> to >> > impact this issue: >> > >> > set (CMAKE_CXX_STANDARD 11) >> > set (CMAKE_CXX_STANDARD_REQUIRED ON) >> > set (CMAKE_CXX_EXTENSIONS OFF) >> > >> > I also added >> > export PYARROW_CXXFLAGS="-std=c++11" >> > for the wheel build of pyarrow (no effect). >> >> Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags? >> >> >>> >> >> >> > >> >