Thanks for starting the discussion, 1. I'm +1 for this. 2. We have already supported this in [1] 3. I'm not sure about this, could you give more examples except the cases 1&2? 4&5. I think we also have considered this with the option 'protobuf.read-default-values' [2], is this what you want?
[1] https://issues.apache.org/jira/browse/FLINK-30093 [2] https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/formats/protobuf/#protobuf-read-default-values Adam Richardson <as3ri...@stripe.com.invalid> 于2023年6月28日周三 10:16写道: > Hi there, > > My company is in the process of rebuilding some of our batch Spark-based > ETL pipelines in Flink. We use protobuf to define our schemas. One major > challenge is that Flink protobuf deserialization has some semantic > differences with the ScalaPB encoders we use in our Spark systems. This > poses a serious barrier for adoption as moving any given dataset from Spark > to Flink will potentially break all downstream consumers. I have a long > list of feature requests in this area: > > 1. Support for mapping protobuf optional wrapper types (StringValue, > IntValue, and friends) to nullable primitive types rather than RowTypes > 2. Support for mapping the protobuf Timestamp type to a real timestamp > rather than RowType > 3. A way of defining custom mappings from specific proto types to custom > Flink types (the previous two feature requests could be implemented on > top > of this lower-level feature) > 4. Support for nullability semantics for message types (in the status > quo, an unset message is treated as equivalent to a message with default > values for all fields, which is a confusing user experience) > 5. Support for nullability semantics for primitives types (in many of > our services, the default value for a field of primitive type is > treated as > being equivalent to unset or null, so it would be good to offer this as > a > capability in the data warehouse) > > Would Flink accept patches for any or all of these feature requests? We're > contemplating forking flink-protobuf internally, but it would be better if > we could just upstream the relevant changes. (To my mind, 1, 2, and 4 are > broadly applicable features that are definitely worthy of upstream support. > 3 and 5 may be somewhat more specific to our use case.) > > Thanks, > Adam Richardson > -- Best, Benchao Li