Re: [ANNOUNCE] New Arrow committer: QP Hou

2021-07-27 Thread Fan Liya
Congratulations, QP! Best, Liya Fan On Tue, Jul 27, 2021 at 11:39 PM Weston Pace wrote: > Congratulations QP! > > On Tue, Jul 27, 2021, 12:37 AM Rok Mihevc wrote: > > > Congrats QP! > > > > Rok > > > > On Tue, Jul 27, 2021 at 9:21 AM QP Hou wrote: > > > > > > Thank you all for the warm welcom

Re: [Java] Is hardcoding NullVector .getField() intentional?

2021-07-27 Thread Fan Liya
Hi AI, I understand your concern. It makes sense to me. I am not aware of any special reason for this. So if there are no objections, I think it would be reasonable to change this to make the NullVector consistent with other vectors. Best, Liya Fan On Fri, Jul 23, 2021 at 8:26 PM Al Taylor wro

[DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

2021-07-27 Thread Wes McKinney
I started looking into this about 6 months ago and didn't follow through the analysis completely. As high level context, the columnar database literature (e.g. [1], though the results probably differ on more modern processors as 16 years have passed) suggests that breaking data down into smaller c

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

2021-07-27 Thread Andrew Lamb
Thanks to you both -- this sounds great. On Tue, Jul 27, 2021 at 8:37 AM Jiayu Liu wrote: > Not sure it's necessarily bundled together but I believe a Python, > documentation, etc. release can also be helpful. I can volunteer to help if > somehow these works can be parallelized. > > On Tue, Jul

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

2021-07-27 Thread Andrew Lamb
I would be happy with this approach. Thank you for the suggestion This hybrid approach of both arrow and arrow2 in the same repo seems better to me than separate repos. What I really care about is ensuring we don't have two crates/APIs indefinitely -- as long as we are continually making progress

Re: [DISCUSS] Release Python Datafusion 0.3.0

2021-07-27 Thread Andrew Lamb
I don't have any opinion on the versioning scheme -- thank you for offering to help with CI and release! Maybe we should file a ticket for this effort, or perhaps it is already covered by [1] [1] https://github.com/apache/arrow-datafusion/issues/771 On Tue, Jul 27, 2021 at 8:39 AM Jiayu Liu wr

Re: C++ warning: missing initializer for member

2021-07-27 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-13469 On Tue, Jul 27, 2021 at 3:17 PM Rares Vernica wrote: > > Thanks, Wes. I did something like this to repress the warnings for now: > > #pragma GCC diagnostic push > #pragma GCC diagnostic ignored "-Wmissing-field-initializers" > #include >

Re: C++ warning: missing initializer for member

2021-07-27 Thread Rares Vernica
Thanks, Wes. I did something like this to repress the warnings for now: #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wmissing-field-initializers" #include #pragma GCC diagnostic pop Cheers, Rares On Tue, Jul 27, 2021 at 8:18 PM Wes McKinney wrote: > Seems like this could be fi

Re: [DISCUSS] next iteration of flatbuffer structures

2021-07-27 Thread David Li
Hey Nate, For the first two points, semantically I'm tempted to think of it more like the ability to send a "bag of columns" according to some schema (and hence columns could have differing lengths or even be absent). This could be a new structure alongside a record batch, which is semantically

Re: getFlightInfo behavior

2021-07-27 Thread David Li
Hey James, There's no 'canonical' behavior here - it's up to the application to define the semantics they want. Absent any application-specific knowledge, there's no order imposed by Flight. So I would agree, though it could make sense for an application to tag the endpoints with some ordering

Re: C++ warning: missing initializer for member

2021-07-27 Thread Wes McKinney
Seems like this could be fixed by adding default values: struct DayMilliseconds { int32_t days = 0; int32_t milliseconds = 0; ... }; In the meantime, you would have to suppress the warning in the compiler where it's happening On Tue, Jul 27, 2021 at 12:26 PM Rares Vernica wrote: > > Hello

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

2021-07-27 Thread Andy Grove
Apologies for being late to this discussion. There is a hybrid option to consider here where we add the arrow2 code into the arrow crate as a separate module, so we release one crate containing the "old" API (which we can mark as deprecated) as well as the new API. Java did a similar thing a long

C++ warning: missing initializer for member

2021-07-27 Thread Rares Vernica
Hello, I'm getting a handful of warnings when including arrow/builder.h Is this expected? Should I use the suggested -W flag? In file included from /opt/apache-arrow/include/arrow/array/builder_dict.h:29:0, from /opt/apache-arrow/include/arrow/builder.h:26, /opt/apache-arrow/inc

getFlightInfo behavior

2021-07-27 Thread James Duong
Hi, In getFlightInfo(), the FlightProducer is supposed to return an iterable of endpoints that the client can retrieve data from using doGet(). What's not really clear to me is if the order of the endpoints is significant. My assumption is that the order is not significant -- the client should be

Re: [VOTE] Release Apache Arrow 5.0.0 - RC1

2021-07-27 Thread Krisztián Szűcs
During the verification of M1 wheels we discovered an issue which had to be fixed [1]. Also added support for python 3.8 M1 wheel [2] once I was working on the verification. Extended the verification tasks to exercise the scripts on M1 for both the source tarball and the wheels [3]. I've just subm

Re: C++ Datum::move returns ArrayData not Array

2021-07-27 Thread Benjamin Kietzman
Opened https://issues.apache.org/jira/browse/ARROW-13462 to track correction of the doc's examples On Tue, Jul 27, 2021 at 11:59 AM Benjamin Kietzman wrote: > Sorry, that is a typo. I will open a JIRA to fix the doc. > > In the meantime, incremented_datum.make_array() should work for you > > On

Re: C++ Datum::move returns ArrayData not Array

2021-07-27 Thread Benjamin Kietzman
Sorry, that is a typo. I will open a JIRA to fix the doc. In the meantime, incremented_datum.make_array() should work for you On Tue, Jul 27, 2021, 11:57 Rares Vernica wrote: > Hi, > > I'm trying the example in the Compute Functions user guide > https://arrow.apache.org/docs/cpp/compute.html#in

C++ Datum::move returns ArrayData not Array

2021-07-27 Thread Rares Vernica
Hi, I'm trying the example in the Compute Functions user guide https://arrow.apache.org/docs/cpp/compute.html#invoking-functions std::shared_ptr numbers_array = ...;std::shared_ptr increment = ...;arrow::Datum incremented_datum; ARROW_ASSIGN_OR_RAISE(incremented_datum, arrow

Re: [ANNOUNCE] New Arrow committer: QP Hou

2021-07-27 Thread Weston Pace
Congratulations QP! On Tue, Jul 27, 2021, 12:37 AM Rok Mihevc wrote: > Congrats QP! > > Rok > > On Tue, Jul 27, 2021 at 9:21 AM QP Hou wrote: > > > > Thank you all for the warm welcome! It's been a lot of fun hacking on > > Arrow together with so many talented engineers :) > > > > > > On Mon, J

Re: [DISCUSS] Release Python Datafusion 0.3.0

2021-07-27 Thread Jiayu Liu
A usable Python release would likely boarden use cases and potential scenarios. I wonder what would the versioning be in? should it go along with the datafusion versioning scheme? Either way I'm happy to help with the CI setup and release process. On Wed, Jul 21, 2021 at 9:17 PM Neal Richardson

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

2021-07-27 Thread Jiayu Liu
Not sure it's necessarily bundled together but I believe a Python, documentation, etc. release can also be helpful. I can volunteer to help if somehow these works can be parallelized. On Tue, Jul 27, 2021 at 3:29 PM QP Hou wrote: > Following up on this, since delta-rs could really benefit from t

Re: [ANNOUNCE] New Arrow committer: QP Hou

2021-07-27 Thread Rok Mihevc
Congrats QP! Rok On Tue, Jul 27, 2021 at 9:21 AM QP Hou wrote: > > Thank you all for the warm welcome! It's been a lot of fun hacking on > Arrow together with so many talented engineers :) > > > On Mon, Jul 26, 2021 at 10:37 PM Jorge Cardoso Leitão > wrote: > > > > Congratulations and thank you

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

2021-07-27 Thread QP Hou
Following up on this, since delta-rs could really benefit from this release, I have started some initial work with https://github.com/apache/arrow-datafusion/pull/780 to move things forward. Others are welcome to join the party. On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb wrote: > > Does anyone

Re: [ANNOUNCE] New Arrow committer: QP Hou

2021-07-27 Thread QP Hou
Thank you all for the warm welcome! It's been a lot of fun hacking on Arrow together with so many talented engineers :) On Mon, Jul 26, 2021 at 10:37 PM Jorge Cardoso Leitão wrote: > > Congratulations and thank you for all the great work! It is a pleasure to > work with you. > > Best, > Jorge >