Re: [JAVA] issues encountered during build

2021-03-11 Thread Fan Liya
Hi Bob, Thanks for reporting the issues. I remember encountering the same problems with the JDBC tests (over one year ago). Maybe it is not just related to the time zone, it is also related to the machine locale. I think we can open an issue to track it. Best, Liya Fan On Fri, Mar 12, 2021 at

Re: [DISCUSS][C++] Reduce usage of KernelContext in compute::

2021-03-11 Thread Yibo Cai
Beside reporting errors, maybe a kernel wants to allocate memory through KernelContext::memory_pool [1] in Kernel::init? I'm not quite sure if this is a valid case. Would like to hear other comments. [1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernel.h#L95 Yibo On 3/

Re: [JAVA] issues encountered during build

2021-03-11 Thread Micah Kornfield
Hi Bob, Thanks for some feedback, I don't think a lot of people are developing on windows. Some answers in line: * Build does require Java 8, not "8 or later" as stated in java/README.md > There's a reference to sun.misc.Unsafe > in > memory/memory-core/src/main/java/org/apache/arrow/memory/ut

Re: Question about joining two tables

2021-03-11 Thread Aldrin
Great, thanks for the responses! That all makes sense :) On Thu, Mar 11, 2021 at 1:29 PM Benjamin Kietzman wrote: > Hi Aldrin, > > We don't have a unified repository for design docs that I'm aware of. > Governance-wise only JIRA and the mailing lists are canonical, but > IIUC it'd be legal, stra

[JAVA] issues encountered during build

2021-03-11 Thread bobtins
My mail client took out all the linefeeds, so let me reformat; sorry about that! In the process of slogging through the build, I've bumped into various issues. I'm happy to document them in java/README.md or make any other changes that might be helpful to others. I'm pretty experienced with J

[JAVA] [n00b] issues encountered during build

2021-03-11 Thread Bob Tinsman
I've been mostly lurking for awhile, but I would like to start picking off some bugs in the Java implementation.In the process of slogging through the build,  I've bumped into various issues. I'm happy to document them in java/README.md or make any other changes that might be helpful to others. I

Re: Question about joining two tables

2021-03-11 Thread Benjamin Kietzman
Hi Aldrin, We don't have a unified repository for design docs that I'm aware of. Governance-wise only JIRA and the mailing lists are canonical, but IIUC it'd be legal, straightforward, and beneficial to provide a directory like the one you describe of "design docs proposed to the ML" or so. Ben K

[DISCUSS][C++] Reduce usage of KernelContext in compute::

2021-03-11 Thread Benjamin Kietzman
KernelContext is a tuple consisting of a pointers to an ExecContext and KernelState and an error Status. The context's error Status may be set by compute kernels (for example when divide-by-zero would occur) rather than returning a Result as in the rest of the codebase. IIUC the intent is to avoid

Re: [DISCUSS] [Rust] Donate Ballista to Apache Arrow

2021-03-11 Thread Andy Grove
Hi Jack, Thanks for the input, and there are some interesting ideas there. If we were looking to break this into separate donations though I would actually consider 2+3 to be the first piece to incorporate into DataFusion because it would provide much better scalability compared to the current mo

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Micah Kornfield
FYI, I opened up https://github.com/lz4/lz4-java/issues/176 to discuss support for dependent frames. On Thu, Mar 11, 2021 at 11:59 AM David Li wrote: > At least for Flight, I don't think we'd use that. Right now the way > compression is supported is the same way as with Feather, i.e. the body >

Re: Question about joining two tables

2021-03-11 Thread Wes McKinney
This is a new document that we just started earlier this week. I'd put together some docs in the past to try to bootstrap community organization on this, but since we're now finally putting hands to code after setting up some critical dependencies (like the Datasets interface, which is needed to im

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread David Li
At least for Flight, I don't think we'd use that. Right now the way compression is supported is the same way as with Feather, i.e. the body buffers in each individual record batch sent on the wire are compressed, but not the stream as a whole. (And so far we haven't found a compelling benefit fo

Re: Question about joining two tables

2021-03-11 Thread Aldrin
Hi Ben, thanks for the link! I will eventually be interested in this direction as well, but hadn't seen this document. Is there a place where these design documents can be found? I've seen this and a few other google doc links floating around the mailing list but I can't figure out how to navigate

Re: [DISCUSS] [Rust] Donate Ballista to Apache Arrow

2021-03-11 Thread Jack Chan
Hey Andy I want to discuss the areas of Ballista code that you proposed above to move to Arrow. These are: 1. serde code for translating between protobuf and Arrow/DataFusion/Ballista data structures 2. Distributed query planner 3. Scheduler process that coordinates query execution across availabl

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Antoine Pitrou
Le 11/03/2021 à 19:54, Micah Kornfield a écrit : Indeed, I don't think it was discussed publicly. The LZ4 frame format has several things going for it: - it allows streaming compression and decompression (meaning you can avoid loading a huge compressed buffer at once) Is this something we m

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Micah Kornfield
> > Indeed, I don't think it was discussed publicly. The LZ4 frame format > has several things going for it: > - it allows streaming compression and decompression (meaning you can > avoid loading a huge compressed buffer at once) Is this something we make use of or intend to make use of? > - it

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Joris Peeters
"Is https://github.com/lz4/lz4-java the fast Java lz4 library in question? The incompleteness of this implementation is a known problem for other user communities, not only Arrow. It would be a great public service to improve it so that it fully implements the lz4 frame specification." Very much +

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Steve Kim
I prefer the lz4 frame format for the reasons that Antoine stated. To be friendly to users, the Arrow IPC documentation could mention that lz4 compression may break Java interoperability. If block dependency is the only obstacle to Java interoperability, the Arrow IPC implementation could disable

Re: Question about joining two tables

2021-03-11 Thread Benjamin Kietzman
Hi, This is not yet implemented but it is on the roadmap for the near future: https://docs.google.com/document/d/1AyTdLU-RxA-Gsb9EsYnrQrmqPMOYMfPlWwxRi1Is1tQ Ben Kietzman On Thu, Mar 11, 2021 at 12:33 PM Kirill Lykov wrote: > Hi, > > Is it possible somehow using existing compute functionality

Re: [DISCUSS] [Rust] Donate Ballista to Apache Arrow

2021-03-11 Thread Andy Grove
Thanks, Micah. Regarding integration testing, we currently have an integration test script in the repo that spins up multiple processes in docker compose and runs through a series of queries on a data set that can be generated locally. I invested in some modest hardware (a refurbed 12 core prolian

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Antoine Pitrou
Le 11/03/2021 à 17:58, Micah Kornfield a écrit : We've found in the process of implementing support for LZ4 decompression that the fast Java decoder library does not support all the features of the C++ library (dependendent blocks can't be read, and by default that is what the C++ code emits).

Question about joining two tables

2021-03-11 Thread Kirill Lykov
Hi, Is it possible somehow using existing compute functionality or some other code to join two tables by values in a common column? -- Best regards, Kirill Lykov

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Antoine Pitrou
What about the JNI bindings for lz4-c? Le 11/03/2021 à 18:20, Micah Kornfield a écrit : I looked a little closer and it looks like it only supports Block format (in the code I didn't couldn't find any references to Frame). On Thu, Mar 11, 2021 at 9:16 AM Antoine Pitrou wrote: Have you tr

Re: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Micah Kornfield
I looked a little closer and it looks like it only supports Block format (in the code I didn't couldn't find any references to Frame). On Thu, Mar 11, 2021 at 9:16 AM Antoine Pitrou wrote: > > Have you tried another Java LZ4 library (I think you mentioned Airlift > on a PR)? > > > Le 11/03/2021

Fwd: [DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Antoine Pitrou
Have you tried another Java LZ4 library (I think you mentioned Airlift on a PR)? Le 11/03/2021 à 17:58, Micah Kornfield a écrit : We've found in the process of implementing support for LZ4 decompression that the fast Java decoder library does not support all the features of the C++ library

[DISCUSS] Revisiting LZ4 Compression for Arrow Buffers

2021-03-11 Thread Micah Kornfield
We've found in the process of implementing support for LZ4 decompression that the fast Java decoder library does not support all the features of the C++ library (dependendent blocks can't be read, and by default that is what the C++ code emits). The only library we found (Apache Commons) that seem

Re: [DISCUSS] [Rust] Donate Ballista to Apache Arrow

2021-03-11 Thread Micah Kornfield
I think having Ballista in Arrow sounds like a good idea in the short term. It sounds like there is enough developer pain, that bringing it here makes sense (providing existing Ballista contributors are happy with the change and current Rust maintainers are open to the work involved). One longer

[Rust] Patch release process

2021-03-11 Thread Andy Grove
Now that we have the ability to vote on source releases for patch releases, with each implementation having more freedom to release outside of the major release process, we need to document how to do this for the Rust implementation (and this is probably of interest to other implementations as well

[NIGHTLY] Arrow Build Report for Job nightly-2021-03-11-0

2021-03-11 Thread Crossbow
Arrow Build Report for Job nightly-2021-03-11-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-11-0 Failed Tasks: - conda-linux-gcc-py37-cpu-r40: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-11-0-azure-conda-linux