Hello guys, sorry I have to to request for review here again since
progress didn't seem to be made yet :(. These PRs are important to
Dataset Java implementation as the first version of it was too basic to
serve advanced use cases. If reviewing of the big write support PR[1]
sounds to be a bit of complicated I have split the JNI-data-swapping
part to another smaller PR[2] with its own test cases.

Other PRs[3][4][5] all include minor changes so should be
straightforward to review I think. In my team we have been targeting to
enrich this part of code to make it more production-ready for being
used in JVM-based data processing systems like Apache Spark etc. The
codes have been working well and stable enough in many of our end-to-
end stress tests and benchmarks so I am pretty confident that they are
in good status to go to the next move. And some of out future
improvements will happen to be based on these changes so having a up-
to-date base will simplify further development work a lot. So if anyone
have the privilege wants to help review and merge them that would be
very much appreciated. :)

Thanks,
Hongze

[1] https://github.com/apache/arrow/pull/10201
[2] https://github.com/apache/arrow/pull/10883
[3] https://github.com/apache/arrow/pull/10333
[4] https://github.com/apache/arrow/pull/10114
[5] https://github.com/apache/arrow/pull/10652


On Thu, 2021-08-05 at 18:27 +0800, Hongze Zhang wrote:
> Thanks everyone for the quick response! By the way I might raise this
> review request a little bit late because I was working on some other
> projects in the last few months either. Now I just have some time to
> push this forward. :)
> 
> 
> About ARROW-11776:
> 
> On Wed, 2021-08-04 at 08:45 -0700, Micah Kornfield wrote:
> > One thing that can also help is if there is a way to divide any of 
> > the PRs into smaller standalone components it would likely help get 
> > them merged sooner
> 
> Yes. I used to refactor PR#10201 to have both fixes of ARROW-7272 (JNI
> bridge of record batch) and of ARROW-11776 (Dataset write) in it and in
> two different commits. Today I just make another individual PR for
> ARROW-7272: https://github.com/apache/arrow/pull/10883 which I think
> can be reviewed first. Hopefully it would ease the whole reviewing
> process. :)
> 
> 
> And regarding C data interface, I think I will be able to help
> reviewing Roee's work which I think is already in a pretty good shape
> now (maybe after a PR submitted). And once the codes get merged it
> should be easy for Dataset to migrate from current data sharing
> solution to the ABI cause I have extracted a utility layer for data
> sharing in ARROW-7272's PR.
> 
> Thanks,
> Hongze
> 


Reply via email to