[jira] [Updated] (HUDI-3525) Introduce JsonkafkaSourceProcessor to support data preprocess before it is transformed to DataSet

Xianghu Wang (Jira) Mon, 28 Feb 2022 00:11:07 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Xianghu Wang updated HUDI-3525:
-------------------------------
    Description: 
currently we have `Transform` to transform source to target dataset before 
writing, but it is based on DataSet.

In some scenarios, our kafka data is not in the right format we need, such as 
binlog json format.

We need a way to extract the data we need from the original data before 
converting it into a DataSet.

> Introduce JsonkafkaSourceProcessor to support data preprocess before it is 
> transformed to DataSet
> -------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-3525
>                 URL: https://issues.apache.org/jira/browse/HUDI-3525
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: deltastreamer
>            Reporter: Xianghu Wang
>            Assignee: Xianghu Wang
>            Priority: Major
>
> currently we have `Transform` to transform source to target dataset before 
> writing, but it is based on DataSet.
> In some scenarios, our kafka data is not in the right format we need, such as 
> binlog json format.
> We need a way to extract the data we need from the original data before 
> converting it into a DataSet.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3525) Introduce JsonkafkaSourceProcessor to support data preprocess before it is transformed to DataSet

Reply via email to