shangxinli opened a new pull request, #18393:
URL: https://github.com/apache/hudi/pull/18393

   Replace tight coupling to Athena/Muttley services with a pluggable 
CheckpointService interface.
   
   ## Problem
   PR #18127 introduced Kafka offset tracking functionality that was tightly 
coupled to internal Uber services (Athena/Muttley). This made the feature 
unusable for open-source Apache Hudi users, as these internal services are not 
available outside of Uber's infrastructure.
   
   ## Solution
   This PR refactors the checkpoint tracking code to use a pluggable 
architecture:
   
   ### New Components
   - **`CheckpointService` interface** - Defines the contract for checkpoint 
tracking services
     - `getCheckpointInfo()` method for fetching checkpoint data
     - `CheckpointRequest` and `CheckpointInfo` classes with full Builder 
pattern support
   
   - **`NoOpCheckpointService`** - Default implementation for OSS users
     - Returns empty checkpoints (no external dependencies)
     - Allows Hudi to function without external checkpoint tracking
   
   - **`AthenaCheckpointService`** - Athena-specific implementation
     - Wraps the existing `AthenaIngestionGateway`
     - Converts Athena format to generic `CheckpointInfo`
     - Remains available for internal Uber use
   
   ### Modified Components
   - **`FlinkCheckpointClient`** - Simplified from 323 lines to 70 lines
     - Now uses the `CheckpointService` interface
     - Defaults to `NoOpCheckpointService` for OSS compatibility
     - Accepts custom checkpoint service implementations via constructor
   
   - **`TestFlinkCheckpointClient`** - All tests refactored
     - Tests now use the abstraction layer
     - Mock-based testing approach
     - No dependency on Athena services
   
   ## Benefits
   ✅ **OSS Friendly** - Apache Hudi users can use the feature without internal 
Uber services
   ✅ **Extensible** - Enterprises can implement custom checkpoint services for 
their infrastructure
   ✅ **Backward Compatible** - Athena implementation remains available for 
internal use
   ✅ **Clean Separation** - Core checkpoint client is decoupled from specific 
implementations
   ✅ **Better Testability** - Interface-based design enables comprehensive mock 
testing
   
   ## Testing
   - All existing tests updated and passing
   - New tests for `NoOpCheckpointService`
   - Mock-based tests for the abstraction layer
   
   This addresses the coupling concern raised in the code review of #18127.
   
   Fixes #18127


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to