Kong Wei created HUDI-6979:
------------------------------
Summary: support EventTimeBasedCompactionStrategy
Key: HUDI-6979
URL: https://issues.apache.org/jira/browse/HUDI-6979
Project: Apache Hudi
Issue Type: New Feature
Components: compaction
Reporter: Kong Wei
Assignee: Kong Wei
The current compaction strategies are based on the logfile size, the number of
logfile files, etc. The data time of the RO table generated by these strategies
is uncontrollable. Hudi also has a DayBased strategy, but it relies on day
based partition path and the time granularity is coarse.
The *EventTimeBasedCompactionStrategy* strategy can generate event
time-friendly RO tables, whether it is day based partition or not. For example,
the strategy can select all logfiles whose data time is before 3 am for
compaction, so that the generated RO table data is before 3 am. If we just want
to query data before 3 am, we can just query the RO table which is much faster.
With the strategy, I think we can expand the application scenarios of RO tables.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)