[
https://issues.apache.org/jira/browse/FLINK-38689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Khaled Hammouda updated FLINK-38689:
------------------------------------
Description:
h2. Problem
When protobuf messages contain fields like:
* Enum field `status` + string field `status_value`
* Repeated field `tags` + string field `tags_list` and/or `tags_count`
The protoc compiler renames accessor methods in such cases by suffxing methods
with field numbers to avoid conflicts in generated code (e.g.,
`getStatus1Value()` instead of `getStatusValue()`, `getTags4List()` instead of
`getTagsList()`, etc.). Flink's dynamic code generation assumes standard naming
and thus fails at runtime to call the right methods.
Users typically would work around this issue by renaming their proto fields to
avoid those conflicts in generated code, but sometimes renaming field is an
expensive choice due to being a breaking change.
h2. Proposed Solution
Implement `PbFieldConflictResolver` that:
* Detects accessor name conflicts by analyzing message descriptors
* Applies field number suffixes matching protoc behavior
* Caches resolved mappings for performance
* Integrates with serialization/deserialization codegen
I will submit a pull request to propose this solution.
was:
When protobuf messages contain fields like:
* Enum field `status` + string field `status_value`
* Repeated field `tags` + string field `tags_list` and/or `tags_count`
The protoc compiler renames accessor methods in such cases by suffxing methods
with field numbers to avoid conflicts in generated code (e.g.,
`getStatus1Value()` instead of `getStatusValue()`, `getTags4List()` instead of
`getTagsList()`, etc.). Flink's dynamic code generation assumes standard naming
and thus fails at runtime to call the right methods.
Users typically would work around this issue by renaming their proto fields to
avoid those conflicts in generated code, but sometimes renaming field is an
expensive choice due to being a breaking change.
### Solution
Implement `PbFieldConflictResolver` that:
- Detects accessor name conflicts by analyzing message descriptors
- Applies field number suffixes matching protoc behavior
- Caches resolved mappings for performance
- Integrates with serialization/deserialization codegen
> Support Protobuf 4.x field name conflict resolution in dynamic codegen
> ----------------------------------------------------------------------
>
> Key: FLINK-38689
> URL: https://issues.apache.org/jira/browse/FLINK-38689
> Project: Flink
> Issue Type: Improvement
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Affects Versions: 2.1.0, 2.1.1
> Reporter: Khaled Hammouda
> Priority: Major
> Labels: pull-request-available
>
> h2. Problem
> When protobuf messages contain fields like:
> * Enum field `status` + string field `status_value`
> * Repeated field `tags` + string field `tags_list` and/or `tags_count`
> The protoc compiler renames accessor methods in such cases by suffxing
> methods with field numbers to avoid conflicts in generated code (e.g.,
> `getStatus1Value()` instead of `getStatusValue()`, `getTags4List()` instead
> of `getTagsList()`, etc.). Flink's dynamic code generation assumes standard
> naming and thus fails at runtime to call the right methods.
> Users typically would work around this issue by renaming their proto fields
> to avoid those conflicts in generated code, but sometimes renaming field is
> an expensive choice due to being a breaking change.
> h2. Proposed Solution
> Implement `PbFieldConflictResolver` that:
> * Detects accessor name conflicts by analyzing message descriptors
> * Applies field number suffixes matching protoc behavior
> * Caches resolved mappings for performance
> * Integrates with serialization/deserialization codegen
> I will submit a pull request to propose this solution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)