Siddharth, For working off a branch, Wes has created https://github.com/apache/arrow/tree/java-vector-refactor that we can submit PR to.
For Legacy vectors, I think it's fine because it's really just a migration path to help Dremio to migrate to the new vectors. I don't think other users, i.e., Spark will use the Legacy vector class. Bryan and I will just migrate Spark to new vectors directly because Spark's use of Arrow is very simple. On Thu, Oct 12, 2017 at 2:08 PM, Siddharth Teotia <siddha...@dremio.com> wrote: > Thanks Bryan and Li. > > Yes, the goal is to get this (and the subsequent patches) merged to the new > branch. Once it is stabilized from different aspects, we can move to > master. I am not sure of the exact mechanics when we work off a different > project branch and not master. > > Does that sound good? > > Regarding compatibility, are we suggesting that let's not create Legacy > Nullable vectors at all? The initial thoughts were to generate Legacy > vectors from NullableValueVectors template and these vectors are > mutator/accessor based (in today's world). Internally each operation will > be delegated to new vectors (non code generated). > > On Thu, Oct 12, 2017 at 10:58 AM, Bryan Cutler <cutl...@gmail.com> wrote: > > > Thanks for the update Siddharth. From the Spark side of this, I > definitely > > want to try to upgrade to the latest Arrow before the Spark 2.3 release > but > > if it the refactor is too disruptive then others might get squeamish > about > > upgrading. On the other hand, I don't think we should hold back on > > refactoring for compatibility sake and the way it's looking now trying to > > be backwards-compatible will be too much of a pain. I will try to figure > > out the timeline for Spark 2.3 and what the feeling is for upgrading > > Arrow. Can we hold off on merging this to master for now and just work > out > > of the separate branch until we can get a better feeling for the impact? > > > > Bryan > > > > On Wed, Oct 11, 2017 at 8:19 AM, Li Jin <ice.xell...@gmail.com> wrote: > > > > > Hi Siddharth, > > > > > > Thanks for the update. This looks good. > > > > > > A few thoughts: > > > > > > *Compatibility:* > > > It sounds like we will introduce some back-compatibility with the new > > > Vector class. At this point I think our main Java users should be Spark > > and > > > Dremio, is this right? > > > > > > > > > - For Spark: > > > > > > It seems fine since Spark uses just the basic functionality of Vector > > > classes and the existing code should work with the new Vector classes, > > > maybe even without any code change on the Spark side. > > > > > > > > > - For Dremio: > > > > > > Sounds like you are already taking care of this by introducing the > > > LegacyVector classes. > > > > > > > > > *Testing:* > > > > > > - Spark Integration Tests: > > > > > > Bryan and I can help with integration test with Spark. I think the > target > > > timeline for Spark 2.3 release is some time in mid Nov (Bryan please > > > correct me if I am wrong). > > > > > > I will take a look at the PR today. > > > > > > > > > > > > > > > > > > > > > On Tue, Oct 10, 2017 at 4:29 PM, Siddharth Teotia < > siddha...@dremio.com> > > > wrote: > > > > > > > Hi All, > > > > > > > > I wanted to update everyone on state of this mini-project: > > > > > > > > > > > > - Requirements document and initial design proposal were sent out > to > > > the > > > > community for review and we have received some good feedback. All > > > > required > > > > docs are attached with corresponding JIRAs. > > > > > > > > > > > > - The initial prototype is in a reasonable state (code-complete). > > You > > > > can see the PR here - https://github.com/apache/arrow/pull/1164 > > > > > > > > > > > > - The prototype has code changes for the new hierarchy, abstract > > > > interfaces for fixed width and variable width vectors and concrete > > > > implementation of NullableIntVector and NullableVarCharVector. > > > > > > > > > > > > Plan for testing and integrating into existing infrastructure: > > > > > > > > > > > > - My initial thoughts are that this particular patch will require > a > > > lot > > > > of testing, reviews etc since the foundation of rest of the > > > > implementation > > > > more or less depends on how the APIs are flushed out here. > > > > > > > > > > > > - So the goal is to get this properly tested and merged into > master > > > > first. > > > > > > > > > > > > - The idea is to slowly deprecate and remove the existing vectors > in > > > > stages. In this patch itself, we change the existing > > > > NullableValueVectors.java template to generate > > LegacyNullableIntVector > > > > and > > > > LegacyNullableVarCharVector. Each operation on these vectors will > > > > delegate > > > > to the corresponding NullableIntVector and NullableVarCharVector > > that > > > > are > > > > newly implemented. > > > > > > > > > > > > - This achieves two goals w.r.t testing: > > > > > > > > > > > > - Firstly, our existing JAVA unit tests will automatically > exercise > > > the > > > > newly written code and its APIs (API names have not changed) > for > > > > NullableInt and NullableVarChar vectors. > > > > > > > > > > > > - Secondly, let's say we rebase Dremio on top of Arrow master and > > > > replace all references to NullableIntVector and > > > > NullableVarCharVector with > > > > their Legacy counterparts, things should still work. > > > > > > > > > > > > - After this patch gets merged, we can do the following work in > > > multiple > > > > patches: > > > > - Write concrete implementations for rest of the nullable types > > -- > > > > FLOAT4, FLOAT8, BIGINT, VARBINARY etc > > > > > > > > > > > > - Write additional tests (definitely needed but the first goal is > to > > > > make sure existing tests are not broken). > > > > > > > > > > > > - Ensure NullableValueVectors template generates Legacy vectors > and > > > each > > > > operation is merely a delegation to the API in new > > implementation. > > > > > > > > > > > > - In the next Arrow release, remove all Legacy vectors and > > > > NullableValueVectors template since we will have the > > implementation > > > > for > > > > each type that passes existing tests. > > > > > > > > > > > > - I am currently inspecting the newly written code and making > > changes > > > to > > > > the template to generate Legacy vector types for Nullable Int > > > > and Nullable > > > > VarChar and delegating the operations. The changes should be > > > > available in > > > > the PR in a couple of hours. > > > > > > > > > > > > I am wondering if there are any other ideas around testing, merging > > etc. > > > > Please feel free to reply here or comment on the PR. > > > > > > > > I would appreciate if people can take time to review the code in PR > -- > > > > especially the abstract classes BaseNullableFixedWidth and > > > > BaseNullableVariableWidth. Writing concrete implementations for other > > > types > > > > will be much less hassle if these abstract classes have proper code. > > > > > > > > Thanks, > > > > Siddharth > > > > > > > > > >