[RESULT][VOTE] Adopt ADBC database client connectivity specification

2022-10-10 Thread David Li
My vote: +1

The vote passes with 4 binding +1 votes, 8 non-binding +1 votes, and no -1 
votes. Thanks all!

I will update the RFC PR and merge it next, and will continue setting up the CI 
and release process.

-David

On Wed, Oct 5, 2022, at 15:38, David Li wrote:
> Kirill (CC'd) mentioned the organization could be improved [1] - I've 
> put up a PR to move the definitions around to make it easier for 
> implementors [2].
>
> I'll leave this thread open for a little while longer for any 
> interested parties/see if anyone has PR comments before I'll merge the 
> PR/vote/close/etc. Thanks all!
>
> [1]: https://github.com/apache/arrow-adbc/issues/145
> [2]: https://github.com/apache/arrow-adbc/pull/148
>
> -David
>
> On Wed, Oct 5, 2022, at 15:27, Neal Richardson wrote:
>> +1
>>
>> (I think this makes 4 binding +1s, if I count correctly)
>>
>> On Wed, Oct 5, 2022 at 11:30 AM Antoine Pitrou  wrote:
>>
>>>
>>> +1 (binding), with the caveat that I looked mostly at the C API.
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> Le 21/09/2022 à 17:40, David Li a écrit :
>>> > Hello,
>>> >
>>> > We have been discussing [1] standard interfaces for Arrow-based database
>>> access and have been working on implementations of the proposed interfaces
>>> [2], all under the name "ADBC". This proposal aims to provide a unified
>>> client abstraction across Arrow-native database protocols (like Flight SQL)
>>> and non-Arrow database protocols, which can then be used by Arrow projects
>>> like Dataset/Acero and ecosystem projects like Ibis.
>>> >
>>> > For details, see the RFC here:
>>> https://github.com/apache/arrow/pull/14079
>>> >
>>> > I would like to propose that the Arrow project adopt this RFC, along
>>> with apache/arrow-adbc commit 7866a56 [3], as version 1.0.0 of the ADBC API
>>> standard.
>>> >
>>> > Please vote to adopt the specification as described above. (This is not
>>> a vote to release any components.)
>>> >
>>> > This vote will be open for at least 72 hours.
>>> >
>>> > [ ] +1 Adopt the ADBC specification
>>> > [ ]  0
>>> > [ ] -1 Do not adopt the specification because...
>>> >
>>> > Thanks to the DuckDB and R DBI projects for providing feedback on and
>>> implementations of the proposal.
>>> >
>>> > [1]: https://lists.apache.org/thread/cq7t9s5p7dw4vschylhwsfgqwkr5fmf2
>>> > [2]: https://github.com/apache/arrow-adbc
>>> > [3]:
>>> https://github.com/apache/arrow-adbc/commit/7866a566f5b7b635267bfb7a87ea49b01dfe89fa
>>> >
>>> > Thank you,
>>> > David
>>>


Re: [DISCUSS] Python Wheel Size

2022-10-10 Thread Wes McKinney
We've discussed this in the past, I think. In addition to having many
optional components enabled, the pyarrow wheel also includes the unit
tests directory which is of growing size. I think if we made a
pyarrow-slim wheel with support only for core Arrow (IPC, etc.) and
Parquet file reading, it might be possible to trim by significant
percentage.

Rusty -- if you would like to push this forward I would suggest
creating an alternative wheel build script to the one that we use and
modify flags / add other customizations (e.g. trimming unit tests)
that produce a wheel that we could build and possibly upload as
"pyarrow-slim" on PyPI

On Mon, Oct 3, 2022 at 8:55 AM Antoine Pitrou  wrote:
>
>
> Hi Rusty,
>
> Le 02/10/2022 à 22:51, Rusty Conover a écrit :
> > Hi Arrow Team,
> >
> > I'm using Apache Arrow with AWS Lambda Functions.
> >
> > The primary motivation is AWS Athena's user-defined functions[1].  Those
> > functions process and return Arrow IPC segments.
> >
> > * The published Python wheels for Apache Arrow include almost every feature
> > of Arrow. (Gandiva, Plasma, Flight)
>
> Gandiva isn't compiled in the Python wheels. Plasma is reasonably small
> (but is also being deprecated soon). Flight is more sizable. However,
> most of the size seems to be in Arrow itself and Parquet. A large part
> of the size is probably attributable to the Arrow compute engine and
> functions, and also perhaps to filesystem implementations such as S3 and
> GCS (due to the large third-party dependencies that they bundle).
>
> > Would it be possible to create a new Python package (i.e., "pyarrow-slim")
> > that would disable some of the functionality but result in smaller python
> > wheels?
>
> Perhaps. The first step would be to allow disabling more components in
> PyArrow, though. Otherwise I'm afraid the size reduction wouldn't be
> terrific.
>
> Regards
>
> Antoine.


[RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 13.0.0 RC1

2022-10-10 Thread Andy Grove
With 9 +1 votes (4 binding), the vote passes. I have published artifacts to
crates.io

Thank you for helping with verifying the release.


On Sat, Oct 8, 2022 at 8:51 PM Kun Liu  wrote:

> +1 (non-binding)
>
> Do the validation on Intel Mac
>
> Ashish  于2022年10月9日周日 00:22写道:
>
> > +1 (non-binding)
> >
> > validated on M1 Mac
> >
> > On Fri, Oct 7, 2022 at 4:26 AM Andy Grove  wrote:
> >
> > > Hi,
> > >
> > > I would like to propose a release of Apache Arrow DataFusion
> > > Implementation,
> > > version 13.0.0.
> > >
> > > This release candidate is based on commit:
> > > 807a0c1d2963f6ca327d316badb4ed0fa77e9f21 [1]
> > > The proposed release tarball and signatures are hosted at [2].
> > > The changelog is located at [3].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> and
> > > vote
> > > on the release. The vote will be open for at least 72 hours.
> > >
> > > Only votes from PMC members are binding, but all members of the
> community
> > > are
> > > encouraged to test the release and vote with "(non-binding)".
> > >
> > > The standard verification procedure is documented at
> > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
> > > .
> > >
> > > [ ] +1 Release this as Apache Arrow DataFusion 13.0.0
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow DataFusion 13.0.0 because...
> > >
> > > Here is my vote:
> > >
> > > +1
> > >
> > > [1]:
> > >
> > >
> >
> https://github.com/apache/arrow-datafusion/tree/807a0c1d2963f6ca327d316badb4ed0fa77e9f21
> > > [2]:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-13.0.0-rc1
> > > [3]:
> > >
> > >
> >
> https://github.com/apache/arrow-datafusion/blob/807a0c1d2963f6ca327d316badb4ed0fa77e9f21/CHANGELOG.md
> > >
> >
> >
> > --
> > thanks
> > ashish
> >
>


Build Frustrations

2022-10-10 Thread Joseph Porter
If this isn't the right place to ask, please let me know where to direct
this question:

I'm trying to build a slightly modified version of the 9.0.0 tag for arrow
(particularly Pyarrow).  Walking through the build steps (below), I have
run into a problem where the arrow c++ library has symbols that include the
tag Bcxx11, but the cython libraries aren't generating that flag.  The
symbol resolution fails at runtime once the library loads.

Particularly these two symbols in python/build/lib.linux-x86_64-3.8/pyarrow/
lib.cpython-38-x86_64-linux-gnu.so

U _ZNK5arrow8DataType18ComputeFingerprintEv
U _ZNK5arrow8DataType26ComputeMetadataFingerprintEv

Don't resolve.  All of the corresponding symbols in libarrow_so.900 look
like this (not the addition of the B5cxx11Ev):

006addc0 T _ZNK5arrow8DataType18ComputeFingerprintB5cxx11Ev
006ae050 T _ZNK5arrow8DataType26ComputeMetadataFingerprintB5cxx11Ev

I've tried building with explicit flags to encourage the libraries to
include the cxx11 symbol (in python/CMakeLists.txt).  That doesn't seem to
impact this issue:

set (CMAKE_CXX_STANDARD 11)
set (CMAKE_CXX_STANDARD_REQUIRED ON)
set (CMAKE_CXX_EXTENSIONS OFF)

I also added
export PYARROW_CXXFLAGS="-std=c++11"
for the wheel build of pyarrow (no effect).

I would appreciate any guidance on how to build a clean wheel file of
pyarrow that includes my small code changes.

Best,
-Joe

--
Here's what I'm using:
gcc version 9.4.0
g++ version 9.4.0
cmake version 3.16.3

CMake command for cpp/build:
cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DCMAKE_INSTALL_LIBDIR=lib
-DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON -DARROW_WITH_BZ2=ON
-DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DARROW_WITH_LZ4=ON
-DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON -DARROW_PARQUET=ON
-DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON -DARROW_BUILD_TESTS=ON
-DARROW_CUDA=ON -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON -DARROW_PLASMA=ON
-DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON -DARROW_JSON=ON
-DARROW_WITH_RE2=ON -DARROW_IPC=ON -DARROW_DEPENDENCY_SOURCE=BUNDLED ..

Setup and build for python:
export PYARROW_WITH_DATASET=1
export PYARROW_WITH_BZ2=1
export PYARROW_WITH_ZLIB=1
export PYARROW_WITH_ZSTD=1
export PYARROW_WITH_LZ4=1
export PYARROW_WITH_SNAPPY=1
export PYARROW_WITH_BROTLI=1
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_PARQUET_ENCRYPTION-1
export PYARROW_WITH_PARQUET_ENCRYPTION=1
export PYARROW_WITH_CUDA=1
export PYARROW_WITH_FLIGHT=1
export PYARROW_WITH_GANDIVA=1
export PYARROW_WITH_PLASMA=1
export PYARROW_WITH_S3=1
export PYARROW_WITH_TENSORFLOW=1
export PYARROW_WITH_CSV=1
export PYARROW_WITH_JSON=1
export_PYARROW_WITH_RE2=1
export PYARROW_WITH_IPC=1
export PYARROW_PARALLEL=4
export PYARROW_CXXFLAGS="-std=c++11"
python setup.py build_ext --build-type=release --bundle-arrow-cpp
bdist_wheel


Re: Build Frustrations

2022-10-10 Thread Antoine Pitrou



Le 10/10/2022 à 19:27, Joseph Porter a écrit :


I've tried building with explicit flags to encourage the libraries to
include the cxx11 symbol (in python/CMakeLists.txt).  That doesn't seem to
impact this issue:

set (CMAKE_CXX_STANDARD 11)
set (CMAKE_CXX_STANDARD_REQUIRED ON)
set (CMAKE_CXX_EXTENSIONS OFF)

I also added
export PYARROW_CXXFLAGS="-std=c++11"
for the wheel build of pyarrow (no effect).


Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?


Re: Build Frustrations

2022-10-10 Thread Joseph Porter
Hi Antoine,

Here's what I did:
export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1"

Here's what I got:
ImportError:
/workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/
lib.cpython-38-x86_64-linux-gnu.so: undefined symbol:
_ZNK5arrow8DataType18ComputeFingerprintEv

It looks like the symbol mismatch exists already in the libraries that were
created by the C++ build step, which is why I tried to add the c++11
directives to the CMakeLists.txt in the python module.

-Joe

On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou  wrote:

>
> Le 10/10/2022 à 19:27, Joseph Porter a écrit :
> >
> > I've tried building with explicit flags to encourage the libraries to
> > include the cxx11 symbol (in python/CMakeLists.txt).  That doesn't seem
> to
> > impact this issue:
> >
> > set (CMAKE_CXX_STANDARD 11)
> > set (CMAKE_CXX_STANDARD_REQUIRED ON)
> > set (CMAKE_CXX_EXTENSIONS OFF)
> >
> > I also added
> > export PYARROW_CXXFLAGS="-std=c++11"
> > for the wheel build of pyarrow (no effect).
>
> Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?
>


Re: Build Frustrations

2022-10-10 Thread Antoine Pitrou



Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++ 
libraries?



Le 10/10/2022 à 20:20, Joseph Porter a écrit :

Hi Antoine,

Here's what I did:
export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1"

Here's what I got:
ImportError:
/workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/
lib.cpython-38-x86_64-linux-gnu.so: undefined symbol:
_ZNK5arrow8DataType18ComputeFingerprintEv

It looks like the symbol mismatch exists already in the libraries that were
created by the C++ build step, which is why I tried to add the c++11
directives to the CMakeLists.txt in the python module.

-Joe

On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou  wrote:



Le 10/10/2022 à 19:27, Joseph Porter a écrit :


I've tried building with explicit flags to encourage the libraries to
include the cxx11 symbol (in python/CMakeLists.txt).  That doesn't seem

to

impact this issue:

set (CMAKE_CXX_STANDARD 11)
set (CMAKE_CXX_STANDARD_REQUIRED ON)
set (CMAKE_CXX_EXTENSIONS OFF)

I also added
export PYARROW_CXXFLAGS="-std=c++11"
for the wheel build of pyarrow (no effect).


Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?





Re: Parser for expressions

2022-10-10 Thread Sasha Krassovsky
Yes that makes a lot of sense! I’d agree that it would probably be fine to have 
two different syntaxes, seeing as the use-cases are a bit different. 

Did anyone else have any thoughts? Either on the lisp-style syntax for Arrow’s 
Expressions or on having two different syntaxes? (Weston or Antoine?)

Sasha

> On Oct 9, 2022, at 5:38 AM, Jin Shang  wrote:
> 
> Hi Sasha,
> 
> I agree with your points. However Gandiva is kind of specialized in computing 
> arithmetic expressions and it offers little to none non-arithmetic 
> operations. So it is very helpful if its parser understands natural math 
> expressions. 
> 
> Considering that Gandiva is a relatively independent component within the 
> arrow project, and that it’s only a math expression compiler rather than a 
> fully functioned compute engine, maybe it’s acceptable for Gandiva to have 
> its own grammar different from compute/Acero/Substrait etc.
> 
> Best,
> Jin
> 
>> 2022年10月8日 03:01,Sasha Krassovsky  写道:
>> 
>> Hi Jin,
>> I agree it would be good to standardize on a syntax. To me, the advantages 
>> of the lisp-style syntax are:
>> - don’t have to define/implement any kind of precedence rules 
>> - has a uniform syntax (no distinction between prefix and infix operators)
>> - avoids having “special” functions that have an associated arithmetic 
>> symbol 
>> - translates directly to the underlying Expression infrastructure. 
>> 
>> The advantage of the Python-style syntax is that it’s more natural to use 
>> for arithmetic expressions. However, I think for non-arithmetic expressions 
>> this syntax would be more cumbersome. 
>> 
>> Either would work of course, I guess it just depends on the goal. I was 
>> thinking the string representation wouldn’t represent any significant level 
>> of abstraction, it is just a convenience to save on clutter when typing out 
>> expressions. 
>> 
>> Sasha 
>> 
>>> 6 окт. 2022 г., в 22:20, Jin Shang  написал(а):
>>> 
>>> Hi Sasha and Weston,
>>> 
>>> I'm the author of the mentioned Gandiva parser. I agree that having one
>>> unified syntax is ideal. I think one critical divergence between Sasha's
>>> and my proposals is that mine is with C++/Python imperative style (foo(x,
>>> y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a
>>> b)…). I feel like it'll be better for us to settle on one of the styles
>>> before we start implementing the parsers.
>>> 
>>> Best,
>>> Jin
>>> 
 On Friday, October 7, 2022, Sasha Krassovsky 
 wrote:
 
 Hi Weston,
 I’d be happy to donate something like this to Sunstrait if that’s useful,
 I was thinking of proving out a design here before going there. However we
 could also just go straight there :)
 
 Regarding infix operators and such the edge case I was thinking of is that
 a user could potentially add a kernel to the registry called e.g. “+”.
 Would the parser implicitly convert any instances of “+” to “add” and break
 that?
 
 Implicit typing for literals and parameters can probably also be added
 without issues to the current scheme. Would the parameters be passed as an
 std::unordered_map?
 
> Does a field_ref have to be a field name or can it be a field index?
 
 It can be a field index or even a field path. The field ref is parsed
 using FieldRef::FromDotPath ([1] in my original message), which can express
 any FieldRef.
 
 Sasha
 
>> 6 окт. 2022 г., в 16:08, Weston Pace  написал(а):
> 
> Currently Substrait only has a binary (protobuf) serialization (and a
> protobuf JSON one but that's not really human writable and barely
> human readable).  Substrait does not have a text serialization.  I
> believe there is some desire for one (maybe Sasha wants to give it a
> try?).  A text format for Substrait would solve this problem because
> you could go "text expression" -> "substrait expression" -> "arrow
> expression".
> 
> Since no text format exists for Substrait I think that Substrait does
> not currently solve this problem or overlap with your work.  However,
> at some point (hopefully), it will.
> 
> There was also a fairly recent proposal for a parser for gandiva
 expressions[1].
> 
> Compared with [1] I think this proposal is simpler to parse but lacks
> some of the shortcut conveniences (e.g. implicit types for literals,
> support for common infix operators (+, -, /, ...)).
> 
> Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I
 think
> would be useful to have as one could then do something like `auto
> arrow_expr = Parse(my_expr, threshold)`.
> 
> Does a field_ref have to be a field name or can it be a field index?
> The latter is quite useful when the schema has duplicate field names.
> 
> I'm +0.5 on this change.  I worry a bit about having (eventually)
> three different syntaxes.  However, at the moment we h

Re: Parser for expressions

2022-10-10 Thread Antoine Pitrou



I don't see the point of having two different syntaxes.

Also, IMHO lisp-style is harder for many people, so I would rather a 
more "traditional" syntax (though Lisp is historically traditional, of 
course ;-)).



Le 10/10/2022 à 21:10, Sasha Krassovsky a écrit :

Yes that makes a lot of sense! I’d agree that it would probably be fine to have 
two different syntaxes, seeing as the use-cases are a bit different.

Did anyone else have any thoughts? Either on the lisp-style syntax for Arrow’s 
Expressions or on having two different syntaxes? (Weston or Antoine?)

Sasha


On Oct 9, 2022, at 5:38 AM, Jin Shang  wrote:

Hi Sasha,

I agree with your points. However Gandiva is kind of specialized in computing 
arithmetic expressions and it offers little to none non-arithmetic operations. 
So it is very helpful if its parser understands natural math expressions.

Considering that Gandiva is a relatively independent component within the arrow 
project, and that it’s only a math expression compiler rather than a fully 
functioned compute engine, maybe it’s acceptable for Gandiva to have its own 
grammar different from compute/Acero/Substrait etc.

Best,
Jin


2022年10月8日 03:01,Sasha Krassovsky  写道:

Hi Jin,
I agree it would be good to standardize on a syntax. To me, the advantages of 
the lisp-style syntax are:
- don’t have to define/implement any kind of precedence rules
- has a uniform syntax (no distinction between prefix and infix operators)
- avoids having “special” functions that have an associated arithmetic symbol
- translates directly to the underlying Expression infrastructure.

The advantage of the Python-style syntax is that it’s more natural to use for 
arithmetic expressions. However, I think for non-arithmetic expressions this 
syntax would be more cumbersome.

Either would work of course, I guess it just depends on the goal. I was 
thinking the string representation wouldn’t represent any significant level of 
abstraction, it is just a convenience to save on clutter when typing out 
expressions.

Sasha


6 окт. 2022 г., в 22:20, Jin Shang  написал(а):

Hi Sasha and Weston,

I'm the author of the mentioned Gandiva parser. I agree that having one
unified syntax is ideal. I think one critical divergence between Sasha's
and my proposals is that mine is with C++/Python imperative style (foo(x,
y, z), a+b…) and Sasha's is with Lisp functional style ((foo x y z), (+ a
b)…). I feel like it'll be better for us to settle on one of the styles
before we start implementing the parsers.

Best,
Jin


On Friday, October 7, 2022, Sasha Krassovsky 
wrote:

Hi Weston,
I’d be happy to donate something like this to Sunstrait if that’s useful,
I was thinking of proving out a design here before going there. However we
could also just go straight there :)

Regarding infix operators and such the edge case I was thinking of is that
a user could potentially add a kernel to the registry called e.g. “+”.
Would the parser implicitly convert any instances of “+” to “add” and break
that?

Implicit typing for literals and parameters can probably also be added
without issues to the current scheme. Would the parameters be passed as an
std::unordered_map?


Does a field_ref have to be a field name or can it be a field index?


It can be a field index or even a field path. The field ref is parsed
using FieldRef::FromDotPath ([1] in my original message), which can express
any FieldRef.

Sasha


6 окт. 2022 г., в 16:08, Weston Pace  написал(а):


Currently Substrait only has a binary (protobuf) serialization (and a
protobuf JSON one but that's not really human writable and barely
human readable).  Substrait does not have a text serialization.  I
believe there is some desire for one (maybe Sasha wants to give it a
try?).  A text format for Substrait would solve this problem because
you could go "text expression" -> "substrait expression" -> "arrow
expression".

Since no text format exists for Substrait I think that Substrait does
not currently solve this problem or overlap with your work.  However,
at some point (hopefully), it will.

There was also a fairly recent proposal for a parser for gandiva

expressions[1].


Compared with [1] I think this proposal is simpler to parse but lacks
some of the shortcut conveniences (e.g. implicit types for literals,
support for common infix operators (+, -, /, ...)).

Both are lacking parameters (e.g. "(equals(!x, %threshold%))" which I

think

would be useful to have as one could then do something like `auto
arrow_expr = Parse(my_expr, threshold)`.

Does a field_ref have to be a field name or can it be a field index?
The latter is quite useful when the schema has duplicate field names.

I'm +0.5 on this change.  I worry a bit about having (eventually)
three different syntaxes.  However, at the moment we have zero.

[1] https://lists.apache.org/thread/0oyns380hgzvl0y8kwgqoo4fp7ntt3bn


On Wed, Oct 5, 2022 at 1:55 PM Sasha Krassovsky
 wrote:

Hi David,
Could you elaborate on which part of my proposa

Re: Build Frustrations

2022-10-10 Thread Joseph Porter
No go. I still get the B5cxx11 extension on the symbols in the compiled
libarrow library.

Tried:
/workspace/arrow/pyarrow-dev/bin/cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME
-DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON
-DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON
-DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON
-DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON
-DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON
-DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON
-DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON
-DARROW_DEPENDENCY_SOURCE=BUNDLED -D_GLIBCXX_USE_CXX11_ABI=0 ..

For my next attempt, I modified CMakeLists.txt (in arrow/cpp) to explicitly
remove any mention of -std=c++11 in the CXXFLAGS.  I'll let you know if
that works.  Any pointers on better ways to troubleshoot this would also be
helpful.  I'm not super-conversant with CMake.

# Remove --std=c++11 to avoid errors from C compilers
string(REPLACE "-std=c++11" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS})

# Add C++-only flags, like -std=c++11
set(CMAKE_CXX_FLAGS "${CXX_ONLY_FLAGS} ${CMAKE_CXX_FLAGS}")
string(REPLACE "-std=c++11" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})

On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou  wrote:

>
> Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++
> libraries?
>
>
> Le 10/10/2022 à 20:20, Joseph Porter a écrit :
> > Hi Antoine,
> >
> > Here's what I did:
> > export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1"
> >
> > Here's what I got:
> > ImportError:
> > /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/
> > lib.cpython-38-x86_64-linux-gnu.so: undefined symbol:
> > _ZNK5arrow8DataType18ComputeFingerprintEv
> >
> > It looks like the symbol mismatch exists already in the libraries that
> were
> > created by the C++ build step, which is why I tried to add the c++11
> > directives to the CMakeLists.txt in the python module.
> >
> > -Joe
> >
> > On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou 
> wrote:
> >
> >>
> >> Le 10/10/2022 à 19:27, Joseph Porter a écrit :
> >>>
> >>> I've tried building with explicit flags to encourage the libraries to
> >>> include the cxx11 symbol (in python/CMakeLists.txt).  That doesn't seem
> >> to
> >>> impact this issue:
> >>>
> >>> set (CMAKE_CXX_STANDARD 11)
> >>> set (CMAKE_CXX_STANDARD_REQUIRED ON)
> >>> set (CMAKE_CXX_EXTENSIONS OFF)
> >>>
> >>> I also added
> >>> export PYARROW_CXXFLAGS="-std=c++11"
> >>> for the wheel build of pyarrow (no effect).
> >>
> >> Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?
> >>
> >
>


Re: Build Frustrations

2022-10-10 Thread Antoine Pitrou




Le 10/10/2022 à 21:31, Joseph Porter a écrit :

No go. I still get the B5cxx11 extension on the symbols in the compiled
libarrow library.


"-D_GLIBCXX_USE_CXX11_ABI=0" is a C++ compiler flag, not a CMake flag.

One possibility is to pass instead
"-DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0" to CMake.




Tried:
/workspace/arrow/pyarrow-dev/bin/cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME
-DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON
-DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON
-DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON
-DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON
-DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON -DARROW_GANDIVA=ON
-DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON
-DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON
-DARROW_DEPENDENCY_SOURCE=BUNDLED -D_GLIBCXX_USE_CXX11_ABI=0 ..

For my next attempt, I modified CMakeLists.txt (in arrow/cpp) to explicitly
remove any mention of -std=c++11 in the CXXFLAGS.  I'll let you know if
that works.  Any pointers on better ways to troubleshoot this would also be
helpful.  I'm not super-conversant with CMake.

# Remove --std=c++11 to avoid errors from C compilers
string(REPLACE "-std=c++11" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS})

# Add C++-only flags, like -std=c++11
set(CMAKE_CXX_FLAGS "${CXX_ONLY_FLAGS} ${CMAKE_CXX_FLAGS}")
string(REPLACE "-std=c++11" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})

On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou  wrote:



Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++
libraries?


Le 10/10/2022 à 20:20, Joseph Porter a écrit :

Hi Antoine,

Here's what I did:
export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1"

Here's what I got:
ImportError:
/workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/
lib.cpython-38-x86_64-linux-gnu.so: undefined symbol:
_ZNK5arrow8DataType18ComputeFingerprintEv

It looks like the symbol mismatch exists already in the libraries that

were

created by the C++ build step, which is why I tried to add the c++11
directives to the CMakeLists.txt in the python module.

-Joe

On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou 

wrote:




Le 10/10/2022 à 19:27, Joseph Porter a écrit :


I've tried building with explicit flags to encourage the libraries to
include the cxx11 symbol (in python/CMakeLists.txt).  That doesn't seem

to

impact this issue:

set (CMAKE_CXX_STANDARD 11)
set (CMAKE_CXX_STANDARD_REQUIRED ON)
set (CMAKE_CXX_EXTENSIONS OFF)

I also added
export PYARROW_CXXFLAGS="-std=c++11"
for the wheel build of pyarrow (no effect).


Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?









Re: Build Frustrations

2022-10-10 Thread Joseph Porter
That makes sense. I'll give it a try - thanks!

On Mon, Oct 10, 2022 at 2:34 PM Antoine Pitrou  wrote:

>
>
> Le 10/10/2022 à 21:31, Joseph Porter a écrit :
> > No go. I still get the B5cxx11 extension on the symbols in the compiled
> > libarrow library.
>
> "-D_GLIBCXX_USE_CXX11_ABI=0" is a C++ compiler flag, not a CMake flag.
>
> One possibility is to pass instead
> "-DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0" to CMake.
>
>
> >
> > Tried:
> > /workspace/arrow/pyarrow-dev/bin/cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME
> > -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON
> > -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON
> > -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON
> > -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON
> > -DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON
> -DARROW_GANDIVA=ON
> > -DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON
> > -DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON
> > -DARROW_DEPENDENCY_SOURCE=BUNDLED -D_GLIBCXX_USE_CXX11_ABI=0 ..
> >
> > For my next attempt, I modified CMakeLists.txt (in arrow/cpp) to
> explicitly
> > remove any mention of -std=c++11 in the CXXFLAGS.  I'll let you know if
> > that works.  Any pointers on better ways to troubleshoot this would also
> be
> > helpful.  I'm not super-conversant with CMake.
> >
> > # Remove --std=c++11 to avoid errors from C compilers
> > string(REPLACE "-std=c++11" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS})
> >
> > # Add C++-only flags, like -std=c++11
> > set(CMAKE_CXX_FLAGS "${CXX_ONLY_FLAGS} ${CMAKE_CXX_FLAGS}")
> > string(REPLACE "-std=c++11" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
> >
> > On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou 
> wrote:
> >
> >>
> >> Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++
> >> libraries?
> >>
> >>
> >> Le 10/10/2022 à 20:20, Joseph Porter a écrit :
> >>> Hi Antoine,
> >>>
> >>> Here's what I did:
> >>> export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1"
> >>>
> >>> Here's what I got:
> >>> ImportError:
> >>> /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/
> >>> lib.cpython-38-x86_64-linux-gnu.so: undefined symbol:
> >>> _ZNK5arrow8DataType18ComputeFingerprintEv
> >>>
> >>> It looks like the symbol mismatch exists already in the libraries that
> >> were
> >>> created by the C++ build step, which is why I tried to add the c++11
> >>> directives to the CMakeLists.txt in the python module.
> >>>
> >>> -Joe
> >>>
> >>> On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou 
> >> wrote:
> >>>
> 
>  Le 10/10/2022 à 19:27, Joseph Porter a écrit :
> >
> > I've tried building with explicit flags to encourage the libraries to
> > include the cxx11 symbol (in python/CMakeLists.txt).  That doesn't
> seem
>  to
> > impact this issue:
> >
> > set (CMAKE_CXX_STANDARD 11)
> > set (CMAKE_CXX_STANDARD_REQUIRED ON)
> > set (CMAKE_CXX_EXTENSIONS OFF)
> >
> > I also added
> > export PYARROW_CXXFLAGS="-std=c++11"
> > for the wheel build of pyarrow (no effect).
> 
>  Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?
> 
> >>>
> >>
> >
>


Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.5.1 RC1

2022-10-10 Thread Andrew Lamb
With 9 +1 (4 binding) the release is approved.

The release is available here:

https://dist.apache.org/repos/dist/release/arrow/arrow-object-store-rs-0.5.1

I have also released it to crates.io:
https://crates.io/crates/object_store/0.5.1

Thank you all for voting and reviewing the release

Andrew

On Sat, Oct 8, 2022 at 12:06 PM Ashish  wrote:

> +1 (non-binding)
>
> verified on M1 Mac
>
> On Fri, Oct 7, 2022 at 8:32 AM Andrew Lamb  wrote:
>
> > Hi,
> >
> > I would like to propose a release of Apache Arrow Rust Object
> > Store Implementation, version 0.5.1.
> >
> > This release candidate is based on commit:
> > 8a54e95850fe27ac5865a02ef4be2de0937de5b3 [1]
> >
> > The proposed release tarball and signatures are hosted at [2].
> >
> > The changelog is located at [3].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. There is a script [4] that automates some of
> > the verification.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow Rust Object Store
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow Rust Object Store  because...
> >
> > [1]:
> >
> >
> https://github.com/apache/arrow-rs/tree/8a54e95850fe27ac5865a02ef4be2de0937de5b3
> > [2]:
> >
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-object-store-rs-0.5.1-rc1
> > [3]:
> >
> >
> https://github.com/apache/arrow-rs/blob/8a54e95850fe27ac5865a02ef4be2de0937de5b3/object_store/CHANGELOG.md
> > [4]:
> >
> >
> https://github.com/apache/arrow-rs/blob/master/object_store/dev/release/verify-release-candidate.sh
> >
>
>
> --
> thanks
> ashish
>


Re: [RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 13.0.0 RC1

2022-10-10 Thread Sutou Kouhei
Hi Andy,

Could you add "adding a release to
https://reporter.apache.org/addrelease.html?arrow " to
release process of DataFusion and Ballista?
The release information is used to generate a template for
a board report.

FYI:
* Board Report Wizard: https://reporter.apache.org/wizard/?arrow
* A board report draft based on template generated by the
  Board Report Wizard: https://github.com/apache/arrow/pull/14357


Thanks,
-- 
kou

In 
  "[RESULT][VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 13.0.0 RC1" 
on Mon, 10 Oct 2022 06:59:24 -0600,
  Andy Grove  wrote:

> With 9 +1 votes (4 binding), the vote passes. I have published artifacts to
> crates.io
> 
> Thank you for helping with verifying the release.
> 
> 
> On Sat, Oct 8, 2022 at 8:51 PM Kun Liu  wrote:
> 
>> +1 (non-binding)
>>
>> Do the validation on Intel Mac
>>
>> Ashish  于2022年10月9日周日 00:22写道:
>>
>> > +1 (non-binding)
>> >
>> > validated on M1 Mac
>> >
>> > On Fri, Oct 7, 2022 at 4:26 AM Andy Grove  wrote:
>> >
>> > > Hi,
>> > >
>> > > I would like to propose a release of Apache Arrow DataFusion
>> > > Implementation,
>> > > version 13.0.0.
>> > >
>> > > This release candidate is based on commit:
>> > > 807a0c1d2963f6ca327d316badb4ed0fa77e9f21 [1]
>> > > The proposed release tarball and signatures are hosted at [2].
>> > > The changelog is located at [3].
>> > >
>> > > Please download, verify checksums and signatures, run the unit tests,
>> and
>> > > vote
>> > > on the release. The vote will be open for at least 72 hours.
>> > >
>> > > Only votes from PMC members are binding, but all members of the
>> community
>> > > are
>> > > encouraged to test the release and vote with "(non-binding)".
>> > >
>> > > The standard verification procedure is documented at
>> > >
>> > >
>> >
>> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
>> > > .
>> > >
>> > > [ ] +1 Release this as Apache Arrow DataFusion 13.0.0
>> > > [ ] +0
>> > > [ ] -1 Do not release this as Apache Arrow DataFusion 13.0.0 because...
>> > >
>> > > Here is my vote:
>> > >
>> > > +1
>> > >
>> > > [1]:
>> > >
>> > >
>> >
>> https://github.com/apache/arrow-datafusion/tree/807a0c1d2963f6ca327d316badb4ed0fa77e9f21
>> > > [2]:
>> > >
>> > >
>> >
>> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-13.0.0-rc1
>> > > [3]:
>> > >
>> > >
>> >
>> https://github.com/apache/arrow-datafusion/blob/807a0c1d2963f6ca327d316badb4ed0fa77e9f21/CHANGELOG.md
>> > >
>> >
>> >
>> > --
>> > thanks
>> > ashish
>> >
>>


Re: Build Frustrations

2022-10-10 Thread Joseph Porter
I think I found a culprit in the pyarrow cmake config:

if(PYARROW_USE_TENSORFLOW)
  # TensorFlow uses the old GLIBCXX ABI, so we have to use it too
  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0")
endif()

This was overriding my attempts to add c++11 to the pyarrow build (which
wasn't a good idea, apparently).  Still working on the build per your
suggestion. The gandiva test executable build chokes if c++11 isn't
available, so I turned off the test build.

On Mon, Oct 10, 2022 at 3:47 PM Joseph Porter 
wrote:

> That makes sense. I'll give it a try - thanks!
>
> On Mon, Oct 10, 2022 at 2:34 PM Antoine Pitrou  wrote:
>
>>
>>
>> Le 10/10/2022 à 21:31, Joseph Porter a écrit :
>> > No go. I still get the B5cxx11 extension on the symbols in the compiled
>> > libarrow library.
>>
>> "-D_GLIBCXX_USE_CXX11_ABI=0" is a C++ compiler flag, not a CMake flag.
>>
>> One possibility is to pass instead
>> "-DCMAKE_CXX_FLAGS=-D_GLIBCXX_USE_CXX11_ABI=0" to CMake.
>>
>>
>> >
>> > Tried:
>> > /workspace/arrow/pyarrow-dev/bin/cmake
>> -DCMAKE_INSTALL_PREFIX=$ARROW_HOME
>> > -DCMAKE_INSTALL_LIBDIR=lib -DCMAKE_BUILD_TYPE=Release -DARROW_DATASET=ON
>> > -DARROW_WITH_BZ2=ON -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON
>> > -DARROW_WITH_LZ4=ON -DARROW_WITH_SNAPPY=ON -DARROW_WITH_BROTLI=ON
>> > -DARROW_PARQUET=ON -DPARQUET_REQUIRE_ENCRYPTION=ON -DARROW_PYTHON=ON
>> > -DARROW_BUILD_TESTS=ON -DARROW_CUDA=ON -DARROW_FLIGHT=ON
>> -DARROW_GANDIVA=ON
>> > -DARROW_PLASMA=ON -DARROW_S3=ON -DARROW_TENSORFLOW=ON -DARROW_CSV=ON
>> > -DARROW_JSON=ON -DARROW_WITH_RE2=ON -DARROW_IPC=ON
>> > -DARROW_DEPENDENCY_SOURCE=BUNDLED -D_GLIBCXX_USE_CXX11_ABI=0 ..
>> >
>> > For my next attempt, I modified CMakeLists.txt (in arrow/cpp) to
>> explicitly
>> > remove any mention of -std=c++11 in the CXXFLAGS.  I'll let you know if
>> > that works.  Any pointers on better ways to troubleshoot this would
>> also be
>> > helpful.  I'm not super-conversant with CMake.
>> >
>> > # Remove --std=c++11 to avoid errors from C compilers
>> > string(REPLACE "-std=c++11" "" CMAKE_C_FLAGS ${CMAKE_C_FLAGS})
>> >
>> > # Add C++-only flags, like -std=c++11
>> > set(CMAKE_CXX_FLAGS "${CXX_ONLY_FLAGS} ${CMAKE_CXX_FLAGS}")
>> > string(REPLACE "-std=c++11" "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})
>> >
>> > On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou 
>> wrote:
>> >
>> >>
>> >> Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++
>> >> libraries?
>> >>
>> >>
>> >> Le 10/10/2022 à 20:20, Joseph Porter a écrit :
>> >>> Hi Antoine,
>> >>>
>> >>> Here's what I did:
>> >>> export PYARROW_CXXFLAGS="-std=c++11 -D_GLIBCXX_USE_CXX11_ABI=1"
>> >>>
>> >>> Here's what I got:
>> >>> ImportError:
>> >>> /workspace/arrow/pyarrow-test/lib/python3.8/site-packages/pyarrow/
>> >>> lib.cpython-38-x86_64-linux-gnu.so: undefined symbol:
>> >>> _ZNK5arrow8DataType18ComputeFingerprintEv
>> >>>
>> >>> It looks like the symbol mismatch exists already in the libraries that
>> >> were
>> >>> created by the C++ build step, which is why I tried to add the c++11
>> >>> directives to the CMakeLists.txt in the python module.
>> >>>
>> >>> -Joe
>> >>>
>> >>> On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou 
>> >> wrote:
>> >>>
>> 
>>  Le 10/10/2022 à 19:27, Joseph Porter a écrit :
>> >
>> > I've tried building with explicit flags to encourage the libraries
>> to
>> > include the cxx11 symbol (in python/CMakeLists.txt).  That doesn't
>> seem
>>  to
>> > impact this issue:
>> >
>> > set (CMAKE_CXX_STANDARD 11)
>> > set (CMAKE_CXX_STANDARD_REQUIRED ON)
>> > set (CMAKE_CXX_EXTENSIONS OFF)
>> >
>> > I also added
>> > export PYARROW_CXXFLAGS="-std=c++11"
>> > for the wheel build of pyarrow (no effect).
>> 
>>  Can you try adding "-D_GLIBCXX_USE_CXX11_ABI=1" to those flags?
>> 
>> >>>
>> >>
>> >
>>
>