Thanks for starting the discussion,

1. I'm +1 for this.
2. We have already supported this in [1]
3. I'm not sure about this, could you give more examples except the cases
1&2?
4&5. I think we also have considered this with the option
'protobuf.read-default-values' [2], is this what you want?

[1] https://issues.apache.org/jira/browse/FLINK-30093
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/table/formats/protobuf/#protobuf-read-default-values



Adam Richardson <as3ri...@stripe.com.invalid> 于2023年6月28日周三 10:16写道:

> Hi there,
>
> My company is in the process of rebuilding some of our batch Spark-based
> ETL pipelines in Flink. We use protobuf to define our schemas. One major
> challenge is that Flink protobuf deserialization has some semantic
> differences with the ScalaPB encoders we use in our Spark systems. This
> poses a serious barrier for adoption as moving any given dataset from Spark
> to Flink will potentially break all downstream consumers. I have a long
> list of feature requests in this area:
>
>    1. Support for mapping protobuf optional wrapper types (StringValue,
>    IntValue, and friends) to nullable primitive types rather than RowTypes
>    2. Support for mapping the protobuf Timestamp type to a real timestamp
>    rather than RowType
>    3. A way of defining custom mappings from specific proto types to custom
>    Flink types (the previous two feature requests could be implemented on
> top
>    of this lower-level feature)
>    4. Support for nullability semantics for message types (in the status
>    quo, an unset message is treated as equivalent to a message with default
>    values for all fields, which is a confusing user experience)
>    5. Support for nullability semantics for primitives types (in many of
>    our services, the default value for a field of primitive type is
> treated as
>    being equivalent to unset or null, so it would be good to offer this as
> a
>    capability in the data warehouse)
>
> Would Flink accept patches for any or all of these feature requests? We're
> contemplating forking flink-protobuf internally, but it would be better if
> we could just upstream the relevant changes. (To my mind, 1, 2, and 4 are
> broadly applicable features that are definitely worthy of upstream support.
> 3 and 5 may be somewhat more specific to our use case.)
>
> Thanks,
> Adam Richardson
>


-- 

Best,
Benchao Li

Reply via email to