[jira] [Commented] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

liwei (Jira) Sun, 27 Sep 2020 18:51:44 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17202958#comment-17202958
 ]


liwei commented on HUDI-1297:
-----------------------------

[~vinoth] 

very agree with you.

hudi need a sql layer, now presto just can read, spark is  a good choice. Also 
I think deltastreamer  tool now is useful. But user  with large amount of data 
will like spark steaming or sql. I think three thing can do .
 # ease of use data source, support like binlog (include all table's  binlog of 
a mysql instance) easy sync to hudi, use spark dataset or sql api （maybe 
combination of  merge、 upsert、delete）
 # ease of use batch transform, and streaming etl. spark & structured streaming 
 support easy source and sink datasource
 # performance support ,  File/Partition pruning using Hudi metadata tables, 
also with version of hudi can do some cache optimize

but first we can upgrade hudi to spark 3.0 

 

> [Umbrella] Revamp Spark Datasource support using Spark 3 APIs
> -------------------------------------------------------------
>
>                 Key: HUDI-1297
>                 URL: https://issues.apache.org/jira/browse/HUDI-1297
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Spark Integration
>            Reporter: Vinoth Chandar
>            Assignee: Vinoth Chandar
>            Priority: Major
>             Fix For: 0.7.0
>
>
> Yet to be fully scoped out
> But high level, we want to 
>  * Add SQL support for MERGE, DELETE etc
>  * First class support for streaming reads/writes via structured streaming
>  * Row based reader/writers all the way
>  * Support for File/Partition pruning using Hudi metadata tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

Reply via email to