[ 
https://issues.apache.org/jira/browse/HUDI-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-8474:
---------------------------------
    Labels: pull-request-available  (was: )

> Design and Impl MDT repartitioner to assist with writing to MDT
> ---------------------------------------------------------------
>
>                 Key: HUDI-8474
>                 URL: https://issues.apache.org/jira/browse/HUDI-8474
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: metadata, writer-core
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.0.2
>
>
> We need a repartitoner for MDT where we take in HoodieData<HoodieRecords> and 
> return 1 spark task pertaining to 1 file slice in MDT. 
> For eg, for FILES, its typically 1 file slice. 
> for col stats, RLI, etc its based on how user has configured it.
> We should be doing sort within partitioner as well since w/ hfile we might 
> have to sort the keys.
>  
> Except partition stats index, every other index should be straight forward. 
> For partition stats record generation, we have a tracking ticket 
> https://issues.apache.org/jira/browse/HUDI-8476 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to