[ 
https://issues.apache.org/jira/browse/FLINK-38689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khaled Hammouda updated FLINK-38689:
------------------------------------
    Description: 
h2. Problem

When protobuf messages contain fields like:
* Enum field `status` + string field `status_value`
* Repeated field `tags` + string field `tags_list` and/or `tags_count`

The protoc compiler renames accessor methods in such cases by suffxing methods 
with field numbers to avoid conflicts in generated code (e.g., 
`getStatus1Value()` instead of `getStatusValue()`, `getTags4List()` instead of 
`getTagsList()`, etc.). Flink's dynamic code generation assumes standard naming 
and thus fails at runtime to call the right methods.

Users typically would work around this issue by renaming their proto fields to 
avoid those conflicts in generated code, but sometimes renaming field is an 
expensive choice due to being a breaking change.

h2. Proposed Solution

Implement `PbFieldConflictResolver` that:
* Detects accessor name conflicts by analyzing message descriptors
* Applies field number suffixes matching protoc behavior
* Caches resolved mappings for performance
* Integrates with serialization/deserialization codegen

I will submit a pull request to propose this solution.

  was:
When protobuf messages contain fields like:
* Enum field `status` + string field `status_value`
* Repeated field `tags` + string field `tags_list` and/or `tags_count`

The protoc compiler renames accessor methods in such cases by suffxing methods 
with field numbers to avoid conflicts in generated code (e.g., 
`getStatus1Value()` instead of `getStatusValue()`, `getTags4List()` instead of 
`getTagsList()`, etc.). Flink's dynamic code generation assumes standard naming 
and thus fails at runtime to call the right methods.

Users typically would work around this issue by renaming their proto fields to 
avoid those conflicts in generated code, but sometimes renaming field is an 
expensive choice due to being a breaking change.

  ### Solution
  Implement `PbFieldConflictResolver` that:
  - Detects accessor name conflicts by analyzing message descriptors
  - Applies field number suffixes matching protoc behavior
  - Caches resolved mappings for performance
  - Integrates with serialization/deserialization codegen


> Support Protobuf 4.x field name conflict resolution in dynamic codegen
> ----------------------------------------------------------------------
>
>                 Key: FLINK-38689
>                 URL: https://issues.apache.org/jira/browse/FLINK-38689
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>    Affects Versions: 2.1.0, 2.1.1
>            Reporter: Khaled Hammouda
>            Priority: Major
>              Labels: pull-request-available
>
> h2. Problem
> When protobuf messages contain fields like:
> * Enum field `status` + string field `status_value`
> * Repeated field `tags` + string field `tags_list` and/or `tags_count`
> The protoc compiler renames accessor methods in such cases by suffxing 
> methods with field numbers to avoid conflicts in generated code (e.g., 
> `getStatus1Value()` instead of `getStatusValue()`, `getTags4List()` instead 
> of `getTagsList()`, etc.). Flink's dynamic code generation assumes standard 
> naming and thus fails at runtime to call the right methods.
> Users typically would work around this issue by renaming their proto fields 
> to avoid those conflicts in generated code, but sometimes renaming field is 
> an expensive choice due to being a breaking change.
> h2. Proposed Solution
> Implement `PbFieldConflictResolver` that:
> * Detects accessor name conflicts by analyzing message descriptors
> * Applies field number suffixes matching protoc behavior
> * Caches resolved mappings for performance
> * Integrates with serialization/deserialization codegen
> I will submit a pull request to propose this solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to