danny0405 opened a new pull request, #18738:
URL: https://github.com/apache/hudi/pull/18738

   ### Describe the issue this Pull Request addresses
   
   Fix the regression introduced by https://github.com/apache/hudi/pull/17994
   Flink append-only ingestion currently requires a primary key or 
`hoodie.datasource.write.recordkey.field`, even though insert-only writes do 
not require record-key based update or deduplication semantics. This prevents 
valid append-only table definitions from being created through the Flink table 
factory and catalog paths.
   
   This PR relaxes the primary-key requirement for append mode while preserving 
the existing validation for non-append writes. When no record key is declared, 
or the record-key option is empty, Flink uses auto-generated record keys for 
append-only ingestion.
   
   There are no storage format changes and no breaking table format changes.
   
   ### Summary and Changelog
   
   Append-only Flink ingestion can now create and write Hudi tables without 
declaring record keys, while upsert and other non-append write modes still 
require a primary key or record-key option.
   
   - Updated `HoodieTableFactory` so record-key validation is skipped only for 
append mode and still enforced for non-append sinks.
   - Added nullable record-key handling in `OptionsResolver`, including 
`getRecordKeys` and `getBucketIndexKeys`, with null and empty values resolving 
to no keys.
   - Updated `RowDataKeyGens` so auto record-key generation is used when no 
record keys are resolved.
   - Allowed `TimestampBasedAvroKeyGenerator` to be constructed without a 
record-key field by reading `hoodie.datasource.write.recordkey.field` with a 
nullable default.
   - Updated Flink catalog paths (`HoodieCatalog`, `HoodieHiveCatalog`) and 
helper paths (`Pipelines`, `PrimaryKeyPruners`) to use the new record-key 
resolver behavior.
   - Added `TestHoodieTableFactory` coverage for append-only sinks without 
record keys and for non-append sinks still requiring record keys.
   - Added `TestHoodieCatalog` coverage for creating append-only catalog tables 
without primary keys.
   - Added `TestOptionsResolver` coverage for nullable and empty record/index 
key resolution.
   - Updated `TestRowDataKeyGens` coverage for keyless append writes, timestamp 
partition key generation without record keys, empty record-key auto key 
generation, and missing task/instant validation.
   - Updated append-only datasource integration tests to omit default 
record-key options in append write scenarios.
   
   ### Impact
   
   This changes Flink user-facing validation for append-only insert tables: 
users can omit both PRIMARY KEY syntax and 
`hoodie.datasource.write.recordkey.field` when `operation=insert`. Empty 
record-key and bucket-index-key options are resolved as no keys.
   
   Non-append write modes continue to require a record key. Existing tables 
that already declare record keys continue to use those keys. There is no 
storage format change, table version change, or expected performance impact.
   
   The affected area is Flink datasource ingestion, including table factory 
setup, catalog table creation, row-data key generation, and append-only test 
coverage.
   
   ### Risk Level
   
   low
   
   The change touches core Flink table setup and key-generation paths, so the 
main risk is accidentally relaxing record-key validation for non-append writes 
or changing empty option behavior unexpectedly. This is mitigated by targeted 
tests covering append-only keyless writes, non-append validation, resolver 
behavior, catalog creation, timestamp partition key generation, and auto key 
generation.
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to