Hi Micah, Thank you so much for your kind reply.
I don't like a parallel set of vector classes, either, and I believe a flag to turn on and off boundary check is a good suggestion. However, I am not sure if it is acceptable for performance-critical scenarios, because anyway, we need to test the condition, and this test may be evaluated many times (once or several times per row for SQL the engine), which may impact performance. I agree with you that we should try our best to hide such implementation complexity, so that the users can simply create vectors through our facade APIs, without being aware of the implementation details. Best, Liya Fan On Sun, May 5, 2019 at 5:28 PM Fan Liya <liya.fa...@gmail.com> wrote: > Hi all, > > Thank you so much for your attention and valuable feedback. > > Please let me try to address some common questions, before answering > individual ones. > > 1. How much slower is the current Arrow API, compared to directly > accessing off-heap memory? > > According to my (intuitive) experience in vectorizing Flink, the current > API is much slower, at least one or two orders of magnitude slower. > I am sorry I do not have the exact number. However, the conclusion can be > expected to hold true: Parth's experience on Drill also confirms the > conclusion. > In fact, we are working on it. ARROW-5209 is about introducing performance > benchmarks and once that is done, the number will be clear. > > 2. Why is current Arrow APIs so slow? > > I think the main reason is too many function calls. I believe each > function call is highly optimized and only carries out simple work. > However, the number of calls is large. > The example in our online doc gives a simple example: a single call to > Float8Vector.get method (which is an API fundamental enough) involves > nearly 30 method calls. That is just too much overhead, especially for > performance-critical scenarios, like SQL engines. > > 3. Can we live without Arrow, and just directly access the off-heap memory > (e.g. by the UNSAFE instance)? > > I guess the answer is absolutely, yes. > Parth is doing this (bypassing Arrow API) with Drill, and this is exactly > what we are doing with Flink. My point is that, providing light-weight APIs > will make it easier to use Arrow. Without such APIs, Parth may need to > provide a library of Arrow wrappers in Drill, and we will need to provide a > library of Arrow wrappers in Flink, and so on. That's redundant work, and > it may reduce the popularity of Arrow. > > Best, > Liya Fan > > > On Fri, May 3, 2019 at 4:01 AM Jacques Nadeau <jacq...@apache.org> wrote: > >> If someone wants to run without bounds checking, why don't they simply >> flip >> the system property? Are they seeing that code not get eliminated in if >> they set that? I think people are optimizing the wrong things in this >> discussion. The memory address is available. Per Parth's comments, if >> you're working on a specific application, write directly to the memory. >> That's the whole point of the reliable memory format. If something isn't >> working right with the elimination of bounds checking, we can find another >> solution to that and lets make that the ticket. >> >> My other comment on the PR4186 still stands: This doesn't have to be in >> the >> ArrowBuf interface. Because we're factoring out memory as a very simple >> concept, it should be easy to simply create a wrapper object that provides >> this functionality with no impact on performance. We specifically expose >> the memory addressed directly for exactly this type of use. The reality >> is: >> if you want unsafe access, that basically means you don't want guardrails. >> Direct memory access is the simplest/cleanest way to expose exactly that. >> >> On Thu, May 2, 2019 at 8:18 AM Siddharth Teotia <siddha...@dremio.com> >> wrote: >> >> > Looks like there are 2 PRs for this work -- >> > https://github.com/apache/arrow/pull/4186 this PR adds new >> get<type>Unsafe >> > type APIs to ArrowBuf that don't do checkIndex() before calling >> > PlatformDependent.get(memory address). So the access will go through >> > vector.get() -> buffer.get() -> PlatformDependent.get() -> UNSAFE.get >> which >> > is what we do today but without doing any bounds checking >> > >> > I believe the proposal suggested here and the WIP PR -- >> > https://github.com/apache/arrow/pull/4212 adds new versions of vectors >> > where the call to vector.get() bypasses the call to ArrowBuf and >> directly >> > invokes PlatformDependent with absolute address at which we want to >> > read/write. Correct? Looks like the call to arrowbuf is still needed to >> get >> > the starting address of buffer before computing the absolute address >> > >> > I am wondering if much of the overhead is coming from conditions and >> > branches inside bound checking or just the chain of method calls? If it >> is >> > bounds checking then I think the first PR would suffice probably. >> > >> > On Tue, Apr 30, 2019 at 9:46 AM Parth Chandra <par...@apache.org> >> wrote: >> > >> > > FWIW, in Drill's Value Vector code, we found that bounds checking was >> a >> > > major performance bottleneck in operators that wrote to vectors. >> Scans, >> > as >> > > a result, we particularly affected. Another bottleneck was the >> zeroing of >> > > vectors. >> > > There were many unnecessary bounds checks. For example in a varchar >> > vector, >> > > there is one check while writing the data, one while writing the >> validity >> > > bit, one more in the buffer allocator for the data buffer, one more in >> > the >> > > buffer allocator for the validity bit buffer, one more each in the >> > > underlying ByteBuf implementation. It gets worse with repeated/array >> > types. >> > > Some code paths in Drill were optimized to get rid of these bounds >> checks >> > > (eventually I suppose all of them will be updated). The approach was >> to >> > > bypass the ValueVector API and write directly to the Drill(/Arrow)Buf. >> > > Writing to the memory address directly, as is being proposed by Liya >> Fan, >> > > was initially tried but did not have any measurable performance >> > > improvements. BTW, writing to the memory address would also conflict >> with >> > > ARROW-3191. >> > > Note that the performance tests were for Drill queries, not Vectors, >> so >> > > writing to memory directly may still have a noticeable performance >> > benefit >> > > for different use cases. >> > > Sorry, I don't have actual numbers with me to share and I'm not sure >> how >> > > much Arrow has diverged from the original Drill implementation, but >> the >> > > Drill experience would suggest that this proposal certainly has merit. >> > > >> > > Parth >> > > >> > > On Mon, Apr 29, 2019 at 11:18 AM Wes McKinney <wesmck...@gmail.com> >> > wrote: >> > > >> > > > I'm also curious which APIs are particularly problematic for >> > > > performance. In ARROW-1833 [1] and some related discussions there >> was >> > > > the suggestion of adding methods like getUnsafe, so this would be >> like >> > > > get(i) [2] but without checking the validity bitmap >> > > > >> > > > [1] : https://issues.apache.org/jira/browse/ARROW-1833 >> > > > [2]: >> > > > >> > > >> > >> https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/Float8Vector.java#L99 >> > > > >> > > > On Mon, Apr 29, 2019 at 1:05 PM Micah Kornfield < >> emkornfi...@gmail.com >> > > >> > > > wrote: >> > > > > >> > > > > Thanks for the design. Personally, I'm not a huge fan of >> creating a >> > > > > parallel classes for every vector type, this ends up being >> confusing >> > > for >> > > > > developers and adds a lot of boiler plate. I wonder if you could >> > use a >> > > > > similar approach that the memory module uses for turning bounds >> > > checking >> > > > > on/off [1]. >> > > > > >> > > > > Also, I think there was a comment on the JIRA, but are there any >> > > > benchmarks >> > > > > to show the expected improvements? My limited understanding is >> that >> > > for >> > > > > small methods the JVM's JIT should inline them anyways [2] , so >> it is >> > > not >> > > > > clear how much this will improve performance. >> > > > > >> > > > > >> > > > > Thanks, >> > > > > Micah >> > > > > >> > > > > >> > > > > [1] >> > > > > >> > > > >> > > >> > >> https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/BoundsChecking.java >> > > > > [2] >> > > > > >> > > > >> > > >> > >> https://stackoverflow.com/questions/24923040/do-modern-java-compilers-jvm-inline-functions-methods-which-are-called-exactly-f >> > > > > >> > > > > On Sun, Apr 28, 2019 at 2:50 AM Fan Liya <liya.fa...@gmail.com> >> > wrote: >> > > > > >> > > > > > Hi all, >> > > > > > >> > > > > > We are proposing a new set of APIs in Arrow - unsafe vector >> APIs. >> > The >> > > > > > general ideas is attached below, and also accessible from our >> > online >> > > > > > document >> > > > > > < >> > > > >> > > >> > >> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing >> > > > >. >> > > > > > Please give your valuable comments by directly commenting in our >> > > online >> > > > > > document >> > > > > > < >> > > > >> > > >> > >> https://docs.google.com/document/d/13oZFVS1EnNedZd_7udx-h10G2tRTjfgHe2ngp2ZWJ70/edit?usp=sharing >> > > > >, >> > > > > > or relaying this email thread. >> > > > > > >> > > > > > Thank you so much in advance. >> > > > > > >> > > > > > Best, >> > > > > > Liya Fan >> > > > > > >> > > > > > Support Fast/Unsafe Vector APIs for Arrow Background >> > > > > > >> > > > > > In our effort to support columnar data format in Apache Flink, >> we >> > > chose >> > > > > > Apache Arrow as the basic data structure. Arrow greatly >> simplifies >> > > the >> > > > > > support of the columnar data format. However, for many >> scenarios, >> > we >> > > > find >> > > > > > the performance unacceptable. Our investigation shows the >> reason is >> > > > that, >> > > > > > there are too many redundant checks and computations in current >> > Arrow >> > > > API. >> > > > > > >> > > > > > >> > > > > > >> > > > > > For example, the following figures shows that in a single call >> to >> > > > > > Float8Vector.get(int) method (this is one of the most frequently >> > used >> > > > APIs >> > > > > > in Flink computation), there are 20+ method invocations. >> > > > > > >> > > > > > >> > > > > > [image: image.png] >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > There are many other APIs with similar problems. The redundant >> > checks >> > > > and >> > > > > > computations impact performance severely. According to our >> > > evaluation, >> > > > the >> > > > > > performance may degrade by two or three orders of magnitude. >> > > > > > Our Proposal >> > > > > > >> > > > > > For many scenarios, the checks can be avoided, if the >> application >> > > > > > developers can guarantee that all checks will pass. So our >> proposal >> > > is >> > > > to >> > > > > > provide some light-weight APIs. The APIs are also named *unsafe >> > > APIs*, >> > > > in >> > > > > > the sense that that skip most of the checks (not safe) to >> improve >> > the >> > > > > > performance. >> > > > > > >> > > > > > >> > > > > > >> > > > > > In the light-weight APIs, we only provide minimum checks, or >> avoid >> > > > checks >> > > > > > at all. The application owner can still develop and debug their >> > code >> > > > using >> > > > > > the original safe APIs. Once all bugs have been fixed, they can >> > > switch >> > > > to >> > > > > > unsafe APIs in the final version of their products and enjoy the >> > high >> > > > > > performance. >> > > > > > Our Design >> > > > > > >> > > > > > Our goal is to include unsafe vector APIs in Arrow code base, >> and >> > > allow >> > > > > > our customers switching to the new unsafe APIs, without being >> aware >> > > of >> > > > it, >> > > > > > except for the high performance. To achieve this goal, we make >> the >> > > > > > following design choices: >> > > > > > Vector Class Hierarchy >> > > > > > >> > > > > > Each unsafe vector is the subclass of the safe vector. For >> example, >> > > the >> > > > > > unsafe Float8Vector is a subclass of >> > > > org.apache.arrow.vector.Float8Vector: >> > > > > > >> > > > > > >> > > > > > >> > > > > > package org.apache.arrow.vector.unsafe; >> > > > > > >> > > > > > >> > > > > > >> > > > > > public class Float8Vector extends >> > > org.apache.arrow.vector.Float8Vector >> > > > > > >> > > > > > >> > > > > > >> > > > > > So the safe vector acts as a façade of the unsafe vector, and >> > through >> > > > > > polymorphism, the users may not be aware of which type of vector >> > > > he/she is >> > > > > > working with. In addition, the common logics can be reused in >> the >> > > > unsafe >> > > > > > vectors, and we only need to override get/set related methods. >> > > > > > Vector Creation >> > > > > > >> > > > > > We use factory methods to create each type of vectors. Compared >> > with >> > > > > > vector constructors, the factory methods take one more >> parameter, >> > the >> > > > > > vectorType: >> > > > > > >> > > > > > >> > > > > > >> > > > > > public class VectorFactory { >> > > > > > >> > > > > > public static Float8Vector createFloat8Vector(VectorType >> > > vectorType, >> > > > > > String name, BufferAllocator allocator); >> > > > > > >> > > > > > } >> > > > > > >> > > > > > >> > > > > > >> > > > > > VectorType is an enum to separate safe vectors from unsafe ones: >> > > > > > >> > > > > > >> > > > > > >> > > > > > public enum VectorType { >> > > > > > >> > > > > > SAFE, >> > > > > > >> > > > > > UNSAFE >> > > > > > >> > > > > > } >> > > > > > >> > > > > > >> > > > > > >> > > > > > With the factory methods, the old way of creating vectors by >> > > > constructors >> > > > > > can be gradually depreciated. >> > > > > > Vector Implementation >> > > > > > >> > > > > > As discussed above, unsafe vectors mainly override get/set >> methods. >> > > For >> > > > > > get methods, we directly operate on the off-heap memory, without >> > any >> > > > check: >> > > > > > >> > > > > > >> > > > > > >> > > > > > public double get(int index) { >> > > > > > >> > > > > > return >> > > > > > >> > > > >> > > >> > >> Double.longBitsToDouble(PlatformDependent.getLong(valueBuffer.memoryAddress() >> > > > > > + (index << TYPE_LOG2_WIDTH))); >> > > > > > >> > > > > > } >> > > > > > >> > > > > > >> > > > > > >> > > > > > Note that the PlatformDependent API is only 2 stack layers above >> > the >> > > > > > underlying UNSAFE method call. >> > > > > > >> > > > > > >> > > > > > >> > > > > > For set methods, we still need to set the validity bit. However, >> > this >> > > > is >> > > > > > through an unsafe method that directly sets the bits without >> > > checking: >> > > > > > >> > > > > > >> > > > > > >> > > > > > public void set(int index, double value) { >> > > > > > >> > > > > > UnsafeBitVectorHelper.setValidityBitToOne(validityBuffer, >> > > index); >> > > > > > >> > > > > > PlatformDependent.putLong( >> > > > > > >> > > > > > valueBuffer.memoryAddress() + (index << >> > TYPE_LOG2_WIDTH), >> > > > > > Double.doubleToRawLongBits(value)); >> > > > > > >> > > > > > } >> > > > > > >> > > > > > >> > > > > > >> > > > > > Method UnsafeBitVectorHelper.setValidityBitToOne is the unsafe >> > > version >> > > > of >> > > > > > BitVectorHelper.setValidityBitToOne that avoids checks. >> > > > > > >> > > > > > >> > > > > > Test Cases >> > > > > > >> > > > > > We can reuse existing test cases by employing parameterized test >> > > > classes >> > > > > > to test both safe and unsafe vectors. >> > > > > > Current Progress >> > > > > > >> > > > > > We have opened a JIRA for this work item FlINK-5200 >> > > > > > <https://issues.apache.org/jira/browse/ARROW-5200>, and a PR >> > > > > > <https://github.com/apache/arrow/pull/4212> with initial >> > > > implementations >> > > > > > have been opened. We would appreciate if you could give some >> > > comments. >> > > > > > >> > > > >> > > >> > >> >