Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-09 Thread Fan Liya
null checking, like > > > getDirty. See > > > > > > https://issues.apache.org/jira/browse/ARROW-1833 > > > > > > Any thoughts about that? > > > > > > On Thu, May 9, 2019 at 4:54 AM niki.lj > > wrote: > > > > > > > > +1 on th

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-09 Thread Micah Kornfield
t; > > > https://issues.apache.org/jira/browse/ARROW-1833 > > > > Any thoughts about that? > > > > On Thu, May 9, 2019 at 4:54 AM niki.lj > wrote: > > > > > > +1 on this proposal. > > > > > > > > > --------

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-09 Thread Fan Liya
> > > > +1 on this proposal. > > > > > > -- > > 发件人:Fan Liya > > 发送时间:2019年5月9日(星期四) 16:33 > > 收件人:dev > > 主 题:Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow > > > > Hi all, > > > > Our previous results

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-09 Thread Wes McKinney
oposal. > > > -- > 发件人:Fan Liya > 发送时间:2019年5月9日(星期四) 16:33 > 收件人:dev > 主 题:Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow > > Hi all, > > Our previous results on micro-benchmarks show that, the

回复:[DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-09 Thread niki.lj
+1 on this proposal. -- 发件人:Fan Liya 发送时间:2019年5月9日(星期四) 16:33 收件人:dev 主 题:Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow Hi all, Our previous results on micro-benchmarks show that, the original Arrow API is 30

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-09 Thread Fan Liya
Hi all, Our previous results on micro-benchmarks show that, the original Arrow API is 30% slower than the unsafe API. After profiling, we found that, the performance overhead comes from the null-checking in the get method. For example, the get method of Float8Vector looks like this: public doub

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-07 Thread Fan Liya
Hi Jacques, Thanks a lot for your comments. I have evaluated the assembly code of original Arrow API, as well as the unsafe API in our PR Generally, the assembly code generated by JIT for both APIs are of high quality, and for most cases, the assembly c

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-06 Thread Jacques Nadeau
I am still asking the same question: can you please analyze the assembly the JIT is producing and look to identify why the disabled bounds checking is at 30% and what types of things we can do to address. For example, we have talked before about a bytecode transformer that simply removes the bounds

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-06 Thread Fan Liya
Hi Jacques, Thank you so much for your kind reminder. To come up with some performance data, I have set up an environment and run some micro-benchmarks. The server runs Linux, has 64 cores and has 256 GB memory. The benchmarks are simple iterations over some double vectors (the source file is att

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-05 Thread Jacques Nadeau
> > Maybe I need to take a closer look at how the other SQL engines are using > Arrow. To see if they are also bypassing Arrow APIs. > I agree that a random user should be able to protect themselves, and this > is the utmost priority. > > According to my experience in Flink, JIT cannot optimize awa

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-05 Thread Fan Liya
Hi Jacques, Thanks a lot for your kind reply. Please see my comments in line. Best, Liya Fan > > > 1. How much slower is the current Arrow API, compared to directly accessing > off-heap memory? > > According to my (intuitive) experience in vectorizing Flink, the current > API is much slower, at

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-05 Thread Fan Liya
Hi Micah, Thank you so much for your kind reply. I don't like a parallel set of vector classes, either, and I believe a flag to turn on and off boundary check is a good suggestion. However, I am not sure if it is acceptable for performance-critical scenarios, because anyway, we need to test the

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-05 Thread Jacques Nadeau
> > > 1. How much slower is the current Arrow API, compared to directly accessing > off-heap memory? > > According to my (intuitive) experience in vectorizing Flink, the current > API is much slower, at least one or two orders of magnitude slower. > I am sorry I do not have the exact number. Howeve

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-05 Thread Fan Liya
Hi all, Thank you so much for your attention and valuable feedback. Please let me try to address some common questions, before answering individual ones. 1. How much slower is the current Arrow API, compared to directly accessing off-heap memory? According to my (intuitive) experience in vector

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-02 Thread Jacques Nadeau
If someone wants to run without bounds checking, why don't they simply flip the system property? Are they seeing that code not get eliminated in if they set that? I think people are optimizing the wrong things in this discussion. The memory address is available. Per Parth's comments, if you're work

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-05-01 Thread Siddharth Teotia
Looks like there are 2 PRs for this work -- https://github.com/apache/arrow/pull/4186 this PR adds new getUnsafe type APIs to ArrowBuf that don't do checkIndex() before calling PlatformDependent.get(memory address). So the access will go through vector.get() -> buffer.get() -> PlatformDependent.get

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-04-30 Thread Parth Chandra
FWIW, in Drill's Value Vector code, we found that bounds checking was a major performance bottleneck in operators that wrote to vectors. Scans, as a result, we particularly affected. Another bottleneck was the zeroing of vectors. There were many unnecessary bounds checks. For example in a varchar v

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-04-29 Thread Wes McKinney
I'm also curious which APIs are particularly problematic for performance. In ARROW-1833 [1] and some related discussions there was the suggestion of adding methods like getUnsafe, so this would be like get(i) [2] but without checking the validity bitmap [1] : https://issues.apache.org/jira/browse/

Re: [DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-04-29 Thread Micah Kornfield
Thanks for the design. Personally, I'm not a huge fan of creating a parallel classes for every vector type, this ends up being confusing for developers and adds a lot of boiler plate. I wonder if you could use a similar approach that the memory module uses for turning bounds checking on/off [1].

[DISCUSS][JAVA]Support Fast/Unsafe Vector APIs for Arrow

2019-04-28 Thread Fan Liya
Hi all, We are proposing a new set of APIs in Arrow - unsafe vector APIs. The general ideas is attached below, and also accessible from our online document . Please give your valuable comments by dire