I'll also note this isn't quite in final form, I'd still like to add some
more unit tests.
On Fri, Nov 22, 2019 at 11:36 AM Wes McKinney wrote:
> hi Micah -- it makes sense to limit the scope for the time being to
> permitting LargeString/Binary work to proceed. Jacques, have you had a
> chance
hi Micah -- it makes sense to limit the scope for the time being to
permitting LargeString/Binary work to proceed. Jacques, have you had a
chance to look at this?
On Fri, Nov 15, 2019 at 3:07 AM Micah Kornfield wrote:
>
> Apologies for the long delay, I chose to do the minimal work of limiting th
I think the 2GB limit is overly restrictive for modern computers.
This is a problem we must face anyway.
Best,
Liya Fan
On Fri, Nov 15, 2019 at 5:07 PM Micah Kornfield
wrote:
> Apologies for the long delay, I chose to do the minimal work of limiting
> this change [1] to allowing ArrowBuf to 64-
Apologies for the long delay, I chose to do the minimal work of limiting
this change [1] to allowing ArrowBuf to 64-bit lengths. This would unblock
work on LargeString and LargeBinary. If this change looks OK, I think
there is some follow-up work to add more thorough unit/integration tests.
As a
On Fri, Aug 23, 2019, 8:55 PM Micah Kornfield wrote:
> The vector indexes being limited to 32 bits doesn't limit the addressing
>> to 32 bit chunks of memory. For example, you're prime example before was
>> image data. Having 2 billion images of 1mb images would still be supported
>> without chan
>
> The vector indexes being limited to 32 bits doesn't limit the addressing
> to 32 bit chunks of memory. For example, you're prime example before was
> image data. Having 2 billion images of 1mb images would still be supported
> without changing the index addressing.
This might be pre-coffee mat
On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield
wrote:
> I don't think we should couple this discussion with the implementation of
>> large list, etc since I think those two concepts are independent.
>
> I'm still trying to balance in my mind which is a worse experience for
> consumers of the libr
>
> I don't think we should couple this discussion with the implementation of
> large list, etc since I think those two concepts are independent.
I'm still trying to balance in my mind which is a worse experience for
consumers of the libraries for these types. Claiming that Java supports
these ty
I don't think we should couple this discussion with the implementation of
large list, etc since I think those two concepts are independent.
I've asked some others on my team their opinions on the risk here. I think
we should probably review some our more complex vector interactions and see
how the
>
> Hi Jacques, I hope you had a good rest.
I did, thanks!
On Fri, Aug 23, 2019 at 9:25 AM Jacques Nadeau wrote:
> I don't think we should couple this discussion with the implementation of
> large list, etc since I think those two concepts are independent.
>
> I've asked some others on my tea
>
>
> With regards to the reference implementation point. It is a good point.
> I'm on vacation this week. Unless you're pushing hard on this, can we pick
> this up and discuss more next week?
Hi Jacques, I hope you had a good rest. Any more thoughts on the reference
implementation aspect of thi
On Sun, Aug 11, 2019 at 9:40 PM Micah Kornfield wrote:
>
> Hi Wes and Jacques,
> See responses below.
>
> With regards to the reference implementation point. It is a good point. I'm
> > on vacation this week. Unless you're pushing hard on this, can we pick this
> > up and discuss more next week?
>
Hi Wes and Jacques,
See responses below.
With regards to the reference implementation point. It is a good point. I'm
> on vacation this week. Unless you're pushing hard on this, can we pick this
> up and discuss more next week?
Sure thing, enjoy your vacation. I think the only practical implica
Hey Micah,
Appreciate the offer on the compiling. The reality is I'm more concerned
about the unknowns than the compiling issue itself. Any time you've been
tuning for a while, changing something like this could be totally fine or
cause a couple of major issues. For example, we've done a very larg
My stance on this is that I don't know how important it is for Java to
support vectors over INT32_MAX elements. The use cases enabled by
having very large arrays seem to be concentrated in the native code
world (e.g. C/C++/Rust) -- that could just be implementation-centrism
on my part, though. It's
Hi Jacques,
I definitely understand these concerns and this change is risky because it
is so large. Perhaps, creating a new hierarchy, might be the cleanest way
of dealing with this. This could have other benefits like cleaning up some
cruft around dictionary encode and "orphaned" method. Per p
Hey Micah, I didn't have a particular path in mind. Was thinking more along
the lines of extra methods as opposed to separate classes.
Arrow hasn't historically been a place where we're writing algorithms in
Java so the fact that they aren't there doesn't mean they don't exist. We
have a large amo
Hi Jacques,
What avenue were you thinking for supporting both paths? I didn't want to
pursue a different class hierarchy, because I felt like that would
effectively fork the code base, but that is potentially an option that
would allow us to have a complete reference implementation in Java that c
This is a pretty massive change to the apis. I wonder how nasty it would be
to just support both paths. Have you evaluated how complex that would be?
On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield
wrote:
> After more investigation, it looks like Float8Benchmarks at least on my
> machine are wit
After more investigation, it looks like Float8Benchmarks at least on my
machine are within the range of noise.
For BitVectorHelper I pushed a new commit [1], seems to bring the
BitVectorHelper benchmarks back inline (and even with some improvement for
getNullCountBenchmark).
Benchmark
Indeed, the BoundChecking and CheckNullForGet variables can make a big
difference. I didn't initially run the benchmarks with these turned on
(you can see the result from above with Float8Benchmarks). Here are new
numbers including with the flags enabled. It looks like using longs might
be a lit
Hi Gonzalo,
Thanks for sharing the performance results.
I am wondering if you have turned off the flag
BoundsChecking#BOUNDS_CHECKING_ENABLED.
If not, the lower throughput should be expected.
Best,
Liya Fan
On Wed, Aug 7, 2019 at 10:23 PM Micah Kornfield
wrote:
> Hi Gonzalo,
> Thank you for th
Hi Gonzalo,
Thank you for the feedback. I wasn't aware of the JIT implications. At
least on the benchmark run they don't seem to have an impact.
If there are other benchmarks that people have that can validate if this
change will be problematic I would appreciate trying to run them with the
PR.
I would recommend to take care with this kind of changes.
I didn't try Arrow in more than one year, but by then the performance was
quite bad in comparison with plain byte buffer access
(see http://git.net/apache-arrow-development/msg02353.html *) and
there are several optimizations that the JVM (
Hi Micah,
Thanks for your effort. The performance result looks good.
As you indicated, ArrowBuf will take additional 12 bytes (4 bytes for each
of length, write index, and read index).
Similar overheads also exist for vectors like BaseFixedWidthVector,
BaseVariableWidthVector, etc.
IMO, such ove
Hi Liya Fan,
Based on the Float8Benchmark there does not seem to be any meaningful
performance difference on my machine. At least for me, the benchmarks are
not stable enough to say one is faster than the other (I've pasted results
below). That being said my machine isn't necessarily the most rel
Hi Micah,
Thanks a lot for doing this.
I am a little concerned about if there is any negative performance impact
on the current 32-bit-length based applications.
Can we do some performance comparison on our existing benchmarks?
Best,
Liya Fan
On Tue, Aug 6, 2019 at 3:35 PM Micah Kornfield
wro
There have been some previous discussions on the mailing about supporting
64-bit lengths for Java ValueVectors (this is what the IPC specification
and C++ support). I created a PR [1] that changes all APIs that I could
find that take an index to take an "long" instead of an "int" (and
similarly c
28 matches
Mail list logo