Hi all,
Recently the datasets API has been improved a lot and I found some of the new features are very useful to my own work. For example to me a important one is the fix of ARROW-6952[1]. And as I currently work on Java/Scala projects like Spark, I am now investigating a way to call some of the datasets APIs in Java so that I could gain performance improvement from native dataset filters/projectors. Meantime I am also interested in the ability of scanning different data sources provided by dataset API. Regarding using datasets in Java, my initial idea is to port (by writing Java-version implementations) some of the high-level concepts in Java such as DataSourceDiscovery/DataSet/Scanner/FileFormat, then create and call lower level record batch iterators via JNI. This way we seem to retain performance advantages from c++ dataset code. Is anyone interested in this topic also? Or is this something already on the development plan? Any feedback or thoughts would be much appreciated. Best, Hongze [1] https://issues.apache.org/jira/browse/ARROW-6952