Thanks Bryan and Li. Yes, the goal is to get this (and the subsequent patches) merged to the new branch. Once it is stabilized from different aspects, we can move to master. I am not sure of the exact mechanics when we work off a different project branch and not master.
Does that sound good? Regarding compatibility, are we suggesting that let's not create Legacy Nullable vectors at all? The initial thoughts were to generate Legacy vectors from NullableValueVectors template and these vectors are mutator/accessor based (in today's world). Internally each operation will be delegated to new vectors (non code generated). On Thu, Oct 12, 2017 at 10:58 AM, Bryan Cutler <cutl...@gmail.com> wrote: > Thanks for the update Siddharth. From the Spark side of this, I definitely > want to try to upgrade to the latest Arrow before the Spark 2.3 release but > if it the refactor is too disruptive then others might get squeamish about > upgrading. On the other hand, I don't think we should hold back on > refactoring for compatibility sake and the way it's looking now trying to > be backwards-compatible will be too much of a pain. I will try to figure > out the timeline for Spark 2.3 and what the feeling is for upgrading > Arrow. Can we hold off on merging this to master for now and just work out > of the separate branch until we can get a better feeling for the impact? > > Bryan > > On Wed, Oct 11, 2017 at 8:19 AM, Li Jin <ice.xell...@gmail.com> wrote: > > > Hi Siddharth, > > > > Thanks for the update. This looks good. > > > > A few thoughts: > > > > *Compatibility:* > > It sounds like we will introduce some back-compatibility with the new > > Vector class. At this point I think our main Java users should be Spark > and > > Dremio, is this right? > > > > > > - For Spark: > > > > It seems fine since Spark uses just the basic functionality of Vector > > classes and the existing code should work with the new Vector classes, > > maybe even without any code change on the Spark side. > > > > > > - For Dremio: > > > > Sounds like you are already taking care of this by introducing the > > LegacyVector classes. > > > > > > *Testing:* > > > > - Spark Integration Tests: > > > > Bryan and I can help with integration test with Spark. I think the target > > timeline for Spark 2.3 release is some time in mid Nov (Bryan please > > correct me if I am wrong). > > > > I will take a look at the PR today. > > > > > > > > > > > > > > On Tue, Oct 10, 2017 at 4:29 PM, Siddharth Teotia <siddha...@dremio.com> > > wrote: > > > > > Hi All, > > > > > > I wanted to update everyone on state of this mini-project: > > > > > > > > > - Requirements document and initial design proposal were sent out to > > the > > > community for review and we have received some good feedback. All > > > required > > > docs are attached with corresponding JIRAs. > > > > > > > > > - The initial prototype is in a reasonable state (code-complete). > You > > > can see the PR here - https://github.com/apache/arrow/pull/1164 > > > > > > > > > - The prototype has code changes for the new hierarchy, abstract > > > interfaces for fixed width and variable width vectors and concrete > > > implementation of NullableIntVector and NullableVarCharVector. > > > > > > > > > Plan for testing and integrating into existing infrastructure: > > > > > > > > > - My initial thoughts are that this particular patch will require a > > lot > > > of testing, reviews etc since the foundation of rest of the > > > implementation > > > more or less depends on how the APIs are flushed out here. > > > > > > > > > - So the goal is to get this properly tested and merged into master > > > first. > > > > > > > > > - The idea is to slowly deprecate and remove the existing vectors in > > > stages. In this patch itself, we change the existing > > > NullableValueVectors.java template to generate > LegacyNullableIntVector > > > and > > > LegacyNullableVarCharVector. Each operation on these vectors will > > > delegate > > > to the corresponding NullableIntVector and NullableVarCharVector > that > > > are > > > newly implemented. > > > > > > > > > - This achieves two goals w.r.t testing: > > > > > > > > > - Firstly, our existing JAVA unit tests will automatically exercise > > the > > > newly written code and its APIs (API names have not changed) for > > > NullableInt and NullableVarChar vectors. > > > > > > > > > - Secondly, let's say we rebase Dremio on top of Arrow master and > > > replace all references to NullableIntVector and > > > NullableVarCharVector with > > > their Legacy counterparts, things should still work. > > > > > > > > > - After this patch gets merged, we can do the following work in > > multiple > > > patches: > > > - Write concrete implementations for rest of the nullable types > -- > > > FLOAT4, FLOAT8, BIGINT, VARBINARY etc > > > > > > > > > - Write additional tests (definitely needed but the first goal is to > > > make sure existing tests are not broken). > > > > > > > > > - Ensure NullableValueVectors template generates Legacy vectors and > > each > > > operation is merely a delegation to the API in new > implementation. > > > > > > > > > - In the next Arrow release, remove all Legacy vectors and > > > NullableValueVectors template since we will have the > implementation > > > for > > > each type that passes existing tests. > > > > > > > > > - I am currently inspecting the newly written code and making > changes > > to > > > the template to generate Legacy vector types for Nullable Int > > > and Nullable > > > VarChar and delegating the operations. The changes should be > > > available in > > > the PR in a couple of hours. > > > > > > > > > I am wondering if there are any other ideas around testing, merging > etc. > > > Please feel free to reply here or comment on the PR. > > > > > > I would appreciate if people can take time to review the code in PR -- > > > especially the abstract classes BaseNullableFixedWidth and > > > BaseNullableVariableWidth. Writing concrete implementations for other > > types > > > will be much less hassle if these abstract classes have proper code. > > > > > > Thanks, > > > Siddharth > > > > > >