Kong Wei created HUDI-6979:
------------------------------

             Summary: support EventTimeBasedCompactionStrategy
                 Key: HUDI-6979
                 URL: https://issues.apache.org/jira/browse/HUDI-6979
             Project: Apache Hudi
          Issue Type: New Feature
          Components: compaction
            Reporter: Kong Wei
            Assignee: Kong Wei


The current compaction strategies are based on the logfile size, the number of 
logfile files, etc. The data time of the RO table generated by these strategies 
is uncontrollable. Hudi also has a DayBased strategy, but it relies on day 
based partition path and the time granularity is coarse.


The *EventTimeBasedCompactionStrategy* strategy can generate event 
time-friendly RO tables, whether it is day based partition or not. For example, 
the strategy can select all logfiles whose data time is before 3 am for 
compaction, so that the generated RO table data is before 3 am. If we just want 
to query data before 3 am, we can just query the RO table which is much faster.

With the strategy, I think we can expand the application scenarios of RO tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to