[ https://issues.apache.org/jira/browse/HUDI-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xianghu Wang updated HUDI-3525: ------------------------------- Description: currently we have `Transform` to transform source to target dataset before writing, but it is based on DataSet. In some scenarios, our kafka data is not in the right format we need, such as binlog json format. We need a way to extract the data we need from the original data before converting it into a DataSet. > Introduce JsonkafkaSourceProcessor to support data preprocess before it is > transformed to DataSet > ------------------------------------------------------------------------------------------------- > > Key: HUDI-3525 > URL: https://issues.apache.org/jira/browse/HUDI-3525 > Project: Apache Hudi > Issue Type: New Feature > Components: deltastreamer > Reporter: Xianghu Wang > Assignee: Xianghu Wang > Priority: Major > > currently we have `Transform` to transform source to target dataset before > writing, but it is based on DataSet. > In some scenarios, our kafka data is not in the right format we need, such as > binlog json format. > We need a way to extract the data we need from the original data before > converting it into a DataSet. -- This message was sent by Atlassian Jira (v8.20.1#820001)