[jira] [Created] (HUDI-9546) Performance improvements for streaming DAG write with secondary index

Lokesh Jain (Jira) Wed, 25 Jun 2025 12:05:05 -0700

Lokesh Jain created HUDI-9546:
---------------------------------

             Summary: Performance improvements for streaming DAG write with 
secondary index
                 Key: HUDI-9546
                 URL: https://issues.apache.org/jira/browse/HUDI-9546
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Lokesh Jain
             Fix For: 1.1.0



Couple of performance improvements on HUDI-9340.
1. While fetching secondary key from file group, we can project the secondary 
key itself instead of reading the entire record.
2. In HoodieAppendHandle, we can avoid reading the file slice twice to compute 
the secondary index changes. We can use the new records available in the handle 
and merge with previous file slice to compute the secondary index related 
changes.
3. We currently use toString to get the string representation of secondary key. 
We need to ensure this works with all data types - like date, timestamp.
[https://github.com/apache/hudi/blob/e017d85d76b5a2332e96ce0b7e4b2a552f98dadc/hudi-common/src/main/java/org/apache/hudi/metadata/SecondaryIndexRecordGenerationUtils.java#L259]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-9546) Performance improvements for streaming DAG write with secondary index

Reply via email to