nsivabalan opened a new pull request, #18387:
URL: https://github.com/apache/hudi/pull/18387

   ### Describe the issue this Pull Request addresses
   
   Optimized TableSchemaResolver.getTableInternalSchemaFromCommitMetadata() to 
use short-circuit evaluation when searching for the most recent schema-updating 
instant. The previous implementation filtered the entire timeline and then 
called lastInstant(), which required processing all instants. The new 
implementation uses getReverseOrderedInstants().filter(...).findFirst() to stop 
as soon as the first  (most recent) matching instant is found.
   
   ### Summary and Changelog
   
   Summary:
     Users with tables that have long timelines will experience faster internal 
schema lookups, especially when recent commits contain non-schema-updating 
operations (CLUSTER, COMPACT, INDEX, LOG_COMPACT).
   
     Changelog:
     - Refactored 
TableSchemaResolver.getTableInternalSchemaFromCommitMetadata() to use 
getReverseOrderedInstants().filter(...).findFirst()
     instead of filter(...).lastInstant()
     - This enables short-circuit evaluation - the method stops immediately 
upon finding the first (most recent) schema-updating instant
     - Added 4 comprehensive unit tests to validate correctness and verify the 
short-circuit behavior
     - Added inline documentation explaining the optimization
   
     Technical details:
     - Before: completedInstants.filter(predicate) → creates filtered timeline 
→ lastInstant() → processes all instants
     - After: 
completedInstants.getReverseOrderedInstants().filter(predicate).findFirst() → 
stops at first match
   
   ### Impact
   
    Performance improvement with no behavioral changes:
     - Reduces the number of commit metadata reads required, especially 
beneficial for:
       - Tables with long timelines (hundreds or thousands of commits)
       - Scenarios where recent commits are non-schema-updating operations
   
   ### Risk Level
   
   low
   
   ### Documentation Update
   
   None required.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to