[
https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202958#comment-17202958
]
liwei commented on HUDI-1297:
-----------------------------
[~vinoth]
very agree with you.
hudi need a sql layer, now presto just can read, spark is a good choice. Also
I think deltastreamer tool now is useful. But user with large amount of data
will like spark steaming or sql. I think three thing can do .
# ease of use data source, support like binlog (include all table's binlog of
a mysql instance) easy sync to hudi, use spark dataset or sql api (maybe
combination of merge、 upsert、delete)
# ease of use batch transform, and streaming etl. spark & structured streaming
support easy source and sink datasource
# performance support , File/Partition pruning using Hudi metadata tables,
also with version of hudi can do some cache optimize
but first we can upgrade hudi to spark 3.0
> [Umbrella] Revamp Spark Datasource support using Spark 3 APIs
> -------------------------------------------------------------
>
> Key: HUDI-1297
> URL: https://issues.apache.org/jira/browse/HUDI-1297
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Spark Integration
> Reporter: Vinoth Chandar
> Assignee: Vinoth Chandar
> Priority: Major
> Fix For: 0.7.0
>
>
> Yet to be fully scoped out
> But high level, we want to
> * Add SQL support for MERGE, DELETE etc
> * First class support for streaming reads/writes via structured streaming
> * Row based reader/writers all the way
> * Support for File/Partition pruning using Hudi metadata tables
--
This message was sent by Atlassian Jira
(v8.3.4#803005)