[ https://issues.apache.org/jira/browse/KUDU-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke reassigned KUDU-2490: --------------------------------- Assignee: Grant Henke > implement Kudu DataSourceV2 and related classes > ----------------------------------------------- > > Key: KUDU-2490 > URL: https://issues.apache.org/jira/browse/KUDU-2490 > Project: Kudu > Issue Type: Improvement > Components: spark > Reporter: Andrew Wong > Assignee: Grant Henke > Priority: Major > Labels: roadmap-candidate > > The current Kudu-Spark bindings implement a DefaultSource that extends a > RelationProvider, which provides BaseRelations to Spark, which, as I > understand it, are physical units of query execution and represent sets of > rows. The Kudu BaseRelation (the KuduRelation) implements a couple of traits > to fit into Spark: PrunedFilteredScan, which allows predicates to be pushed > into Kudu, and InsertableRelation, which allows writes to be pushed into > Kudu. An issue with these bindings is that, while they provide interfaces to > insert/get data, they do not provide interfaces to push details to Spark that > might be useful to optimizing a Kudu query. > Among other things, this is inconvenient for all datasources that might want > to take such optimizations into their own hands, and the Spark community > appears to be making efforts in revamping their DataSource APIs in the form > of DataSourceV2, and as it pertains to read support, the v2 DataSourceReader. > This new world order provides a clear path towards implementing various > optimizations that are currently unavailable with the current Spark bindings, > without pushing changes to Spark itself. > Of note, the v2 DataSourceReader can be extended with > SupportsReportStatistics, which could allow Kudu to expose statistics to Kudu > without having to rely on HMS (although pushing stats to HMS isn't an > unreasonable approach either). More traits and details about the API can be > found > [here|https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.html]. -- This message was sent by Atlassian Jira (v8.3.4#803005)