Hello guys, sorry I have to to request for review here again since progress didn't seem to be made yet :(. These PRs are important to Dataset Java implementation as the first version of it was too basic to serve advanced use cases. If reviewing of the big write support PR[1] sounds to be a bit of complicated I have split the JNI-data-swapping part to another smaller PR[2] with its own test cases.
Other PRs[3][4][5] all include minor changes so should be straightforward to review I think. In my team we have been targeting to enrich this part of code to make it more production-ready for being used in JVM-based data processing systems like Apache Spark etc. The codes have been working well and stable enough in many of our end-to- end stress tests and benchmarks so I am pretty confident that they are in good status to go to the next move. And some of out future improvements will happen to be based on these changes so having a up- to-date base will simplify further development work a lot. So if anyone have the privilege wants to help review and merge them that would be very much appreciated. :) Thanks, Hongze [1] https://github.com/apache/arrow/pull/10201 [2] https://github.com/apache/arrow/pull/10883 [3] https://github.com/apache/arrow/pull/10333 [4] https://github.com/apache/arrow/pull/10114 [5] https://github.com/apache/arrow/pull/10652 On Thu, 2021-08-05 at 18:27 +0800, Hongze Zhang wrote: > Thanks everyone for the quick response! By the way I might raise this > review request a little bit late because I was working on some other > projects in the last few months either. Now I just have some time to > push this forward. :) > > > About ARROW-11776: > > On Wed, 2021-08-04 at 08:45 -0700, Micah Kornfield wrote: > > One thing that can also help is if there is a way to divide any of > > the PRs into smaller standalone components it would likely help get > > them merged sooner > > Yes. I used to refactor PR#10201 to have both fixes of ARROW-7272 (JNI > bridge of record batch) and of ARROW-11776 (Dataset write) in it and in > two different commits. Today I just make another individual PR for > ARROW-7272: https://github.com/apache/arrow/pull/10883 which I think > can be reviewed first. Hopefully it would ease the whole reviewing > process. :) > > > And regarding C data interface, I think I will be able to help > reviewing Roee's work which I think is already in a pretty good shape > now (maybe after a PR submitted). And once the codes get merged it > should be easy for Dataset to migrate from current data sharing > solution to the ABI cause I have extracted a utility layer for data > sharing in ARROW-7272's PR. > > Thanks, > Hongze >