danny0405 opened a new pull request, #18738: URL: https://github.com/apache/hudi/pull/18738
### Describe the issue this Pull Request addresses Fix the regression introduced by https://github.com/apache/hudi/pull/17994 Flink append-only ingestion currently requires a primary key or `hoodie.datasource.write.recordkey.field`, even though insert-only writes do not require record-key based update or deduplication semantics. This prevents valid append-only table definitions from being created through the Flink table factory and catalog paths. This PR relaxes the primary-key requirement for append mode while preserving the existing validation for non-append writes. When no record key is declared, or the record-key option is empty, Flink uses auto-generated record keys for append-only ingestion. There are no storage format changes and no breaking table format changes. ### Summary and Changelog Append-only Flink ingestion can now create and write Hudi tables without declaring record keys, while upsert and other non-append write modes still require a primary key or record-key option. - Updated `HoodieTableFactory` so record-key validation is skipped only for append mode and still enforced for non-append sinks. - Added nullable record-key handling in `OptionsResolver`, including `getRecordKeys` and `getBucketIndexKeys`, with null and empty values resolving to no keys. - Updated `RowDataKeyGens` so auto record-key generation is used when no record keys are resolved. - Allowed `TimestampBasedAvroKeyGenerator` to be constructed without a record-key field by reading `hoodie.datasource.write.recordkey.field` with a nullable default. - Updated Flink catalog paths (`HoodieCatalog`, `HoodieHiveCatalog`) and helper paths (`Pipelines`, `PrimaryKeyPruners`) to use the new record-key resolver behavior. - Added `TestHoodieTableFactory` coverage for append-only sinks without record keys and for non-append sinks still requiring record keys. - Added `TestHoodieCatalog` coverage for creating append-only catalog tables without primary keys. - Added `TestOptionsResolver` coverage for nullable and empty record/index key resolution. - Updated `TestRowDataKeyGens` coverage for keyless append writes, timestamp partition key generation without record keys, empty record-key auto key generation, and missing task/instant validation. - Updated append-only datasource integration tests to omit default record-key options in append write scenarios. ### Impact This changes Flink user-facing validation for append-only insert tables: users can omit both PRIMARY KEY syntax and `hoodie.datasource.write.recordkey.field` when `operation=insert`. Empty record-key and bucket-index-key options are resolved as no keys. Non-append write modes continue to require a record key. Existing tables that already declare record keys continue to use those keys. There is no storage format change, table version change, or expected performance impact. The affected area is Flink datasource ingestion, including table factory setup, catalog table creation, row-data key generation, and append-only test coverage. ### Risk Level low The change touches core Flink table setup and key-generation paths, so the main risk is accidentally relaxing record-key validation for non-append writes or changing empty option behavior unexpectedly. This is mitigated by targeted tests covering append-only keyless writes, non-append validation, resolver behavior, catalog creation, timestamp partition key generation, and auto key generation. ### Documentation Update none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
