Hi there, My company is in the process of rebuilding some of our batch Spark-based ETL pipelines in Flink. We use protobuf to define our schemas. One major challenge is that Flink protobuf deserialization has some semantic differences with the ScalaPB encoders we use in our Spark systems. This poses a serious barrier for adoption as moving any given dataset from Spark to Flink will potentially break all downstream consumers. I have a long list of feature requests in this area:
1. Support for mapping protobuf optional wrapper types (StringValue, IntValue, and friends) to nullable primitive types rather than RowTypes 2. Support for mapping the protobuf Timestamp type to a real timestamp rather than RowType 3. A way of defining custom mappings from specific proto types to custom Flink types (the previous two feature requests could be implemented on top of this lower-level feature) 4. Support for nullability semantics for message types (in the status quo, an unset message is treated as equivalent to a message with default values for all fields, which is a confusing user experience) 5. Support for nullability semantics for primitives types (in many of our services, the default value for a field of primitive type is treated as being equivalent to unset or null, so it would be good to offer this as a capability in the data warehouse) Would Flink accept patches for any or all of these feature requests? We're contemplating forking flink-protobuf internally, but it would be better if we could just upstream the relevant changes. (To my mind, 1, 2, and 4 are broadly applicable features that are definitely worthy of upstream support. 3 and 5 may be somewhat more specific to our use case.) Thanks, Adam Richardson