Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-11-22 Thread Micah Kornfield
I'll also note this isn't quite in final form, I'd still like to add some more unit tests. On Fri, Nov 22, 2019 at 11:36 AM Wes McKinney wrote: > hi Micah -- it makes sense to limit the scope for the time being to > permitting LargeString/Binary work to proceed. Jacques, have you had a > chance

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-11-22 Thread Wes McKinney
hi Micah -- it makes sense to limit the scope for the time being to permitting LargeString/Binary work to proceed. Jacques, have you had a chance to look at this? On Fri, Nov 15, 2019 at 3:07 AM Micah Kornfield wrote: > > Apologies for the long delay, I chose to do the minimal work of limiting th

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-11-15 Thread Fan Liya
I think the 2GB limit is overly restrictive for modern computers. This is a problem we must face anyway. Best, Liya Fan On Fri, Nov 15, 2019 at 5:07 PM Micah Kornfield wrote: > Apologies for the long delay, I chose to do the minimal work of limiting > this change [1] to allowing ArrowBuf to 64-

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-11-15 Thread Micah Kornfield
Apologies for the long delay, I chose to do the minimal work of limiting this change [1] to allowing ArrowBuf to 64-bit lengths. This would unblock work on LargeString and LargeBinary. If this change looks OK, I think there is some follow-up work to add more thorough unit/integration tests. As a

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Jacques Nadeau
On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield wrote: > The vector indexes being limited to 32 bits doesn't limit the addressing >> to 32 bit chunks of memory. For example, you're prime example before was >> image data. Having 2 billion images of 1mb images would still be supported >> without chan

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Micah Kornfield
> > The vector indexes being limited to 32 bits doesn't limit the addressing > to 32 bit chunks of memory. For example, you're prime example before was > image data. Having 2 billion images of 1mb images would still be supported > without changing the index addressing. This might be pre-coffee mat

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Jacques Nadeau
On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield wrote: > I don't think we should couple this discussion with the implementation of >> large list, etc since I think those two concepts are independent. > > I'm still trying to balance in my mind which is a worse experience for > consumers of the libr

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-22 Thread Micah Kornfield
> > I don't think we should couple this discussion with the implementation of > large list, etc since I think those two concepts are independent. I'm still trying to balance in my mind which is a worse experience for consumers of the libraries for these types. Claiming that Java supports these ty

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-22 Thread Jacques Nadeau
I don't think we should couple this discussion with the implementation of large list, etc since I think those two concepts are independent. I've asked some others on my team their opinions on the risk here. I think we should probably review some our more complex vector interactions and see how the

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-22 Thread Jacques Nadeau
> > Hi Jacques, I hope you had a good rest. I did, thanks! On Fri, Aug 23, 2019 at 9:25 AM Jacques Nadeau wrote: > I don't think we should couple this discussion with the implementation of > large list, etc since I think those two concepts are independent. > > I've asked some others on my tea

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-20 Thread Micah Kornfield
> > > With regards to the reference implementation point. It is a good point. > I'm on vacation this week. Unless you're pushing hard on this, can we pick > this up and discuss more next week? Hi Jacques, I hope you had a good rest. Any more thoughts on the reference implementation aspect of thi

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-14 Thread Wes McKinney
On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield wrote: > > Hi Wes and Jacques, > See responses below. > > With regards to the reference implementation point. It is a good point. I'm > > on vacation this week. Unless you're pushing hard on this, can we pick this > > up and discuss more next week? >

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-11 Thread Micah Kornfield
Hi Wes and Jacques, See responses below. With regards to the reference implementation point. It is a good point. I'm > on vacation this week. Unless you're pushing hard on this, can we pick this > up and discuss more next week? Sure thing, enjoy your vacation. I think the only practical implica

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-11 Thread Jacques Nadeau
Hey Micah, Appreciate the offer on the compiling. The reality is I'm more concerned about the unknowns than the compiling issue itself. Any time you've been tuning for a while, changing something like this could be totally fine or cause a couple of major issues. For example, we've done a very larg

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-11 Thread Wes McKinney
My stance on this is that I don't know how important it is for Java to support vectors over INT32_MAX elements. The use cases enabled by having very large arrays seem to be concentrated in the native code world (e.g. C/C++/Rust) -- that could just be implementation-centrism on my part, though. It's

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Micah Kornfield
Hi Jacques, I definitely understand these concerns and this change is risky because it is so large. Perhaps, creating a new hierarchy, might be the cleanest way of dealing with this. This could have other benefits like cleaning up some cruft around dictionary encode and "orphaned" method. Per p

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Jacques Nadeau
Hey Micah, I didn't have a particular path in mind. Was thinking more along the lines of extra methods as opposed to separate classes. Arrow hasn't historically been a place where we're writing algorithms in Java so the fact that they aren't there doesn't mean they don't exist. We have a large amo

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Micah Kornfield
Hi Jacques, What avenue were you thinking for supporting both paths? I didn't want to pursue a different class hierarchy, because I felt like that would effectively fork the code base, but that is potentially an option that would allow us to have a complete reference implementation in Java that c

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Jacques Nadeau
This is a pretty massive change to the apis. I wonder how nasty it would be to just support both paths. Have you evaluated how complex that would be? On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield wrote: > After more investigation, it looks like Float8Benchmarks at least on my > machine are wit

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-07 Thread Micah Kornfield
After more investigation, it looks like Float8Benchmarks at least on my machine are within the range of noise. For BitVectorHelper I pushed a new commit [1], seems to bring the BitVectorHelper benchmarks back inline (and even with some improvement for getNullCountBenchmark). Benchmark

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-07 Thread Micah Kornfield
Indeed, the BoundChecking and CheckNullForGet variables can make a big difference. I didn't initially run the benchmarks with these turned on (you can see the result from above with Float8Benchmarks). Here are new numbers including with the flags enabled. It looks like using longs might be a lit

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-07 Thread Fan Liya
Hi Gonzalo, Thanks for sharing the performance results. I am wondering if you have turned off the flag BoundsChecking#BOUNDS_CHECKING_ENABLED. If not, the lower throughput should be expected. Best, Liya Fan On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield wrote: > Hi Gonzalo, > Thank you for th

[Discuss][Java] 64-bit lengths for ValueVectors

2019-08-07 Thread Micah Kornfield
Hi Gonzalo, Thank you for the feedback. I wasn't aware of the JIT implications. At least on the benchmark run they don't seem to have an impact. If there are other benchmarks that people have that can validate if this change will be problematic I would appreciate trying to run them with the PR.

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-07 Thread Gonzalo Ortiz Jaureguizar
I would recommend to take care with this kind of changes. I didn't try Arrow in more than one year, but by then the performance was quite bad in comparison with plain byte buffer access (see http://git.net/apache-arrow-development/msg02353.html *) and there are several optimizations that the JVM (

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-07 Thread Fan Liya
Hi Micah, Thanks for your effort. The performance result looks good. As you indicated, ArrowBuf will take additional 12 bytes (4 bytes for each of length, write index, and read index). Similar overheads also exist for vectors like BaseFixedWidthVector, BaseVariableWidthVector, etc. IMO, such ove

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-07 Thread Micah Kornfield
Hi Liya Fan, Based on the Float8Benchmark there does not seem to be any meaningful performance difference on my machine. At least for me, the benchmarks are not stable enough to say one is faster than the other (I've pasted results below). That being said my machine isn't necessarily the most rel

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-06 Thread Fan Liya
Hi Micah, Thanks a lot for doing this. I am a little concerned about if there is any negative performance impact on the current 32-bit-length based applications. Can we do some performance comparison on our existing benchmarks? Best, Liya Fan On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield wro

[Discuss][Java] 64-bit lengths for ValueVectors

2019-08-06 Thread Micah Kornfield
There have been some previous discussions on the mailing about supporting 64-bit lengths for Java ValueVectors (this is what the IPC specification and C++ support). I created a PR [1] that changes all APIs that I could find that take an index to take an "long" instead of an "int" (and similarly c