> - Why does a Flink `CREATE TABLE` from Protobuf require the entire table 
> column structure to be defined in SQL again? Shouldn't fields be inferred 
> automatically from the provided protobuf class?

I agree that auto schema inference is a good feature. The reason why
ProtoBuf Format does not have that capability is that this is a
general feature, not specific to ProtoBuf Format, so it's not included
in the first version of ProtoBuf Format.
It was indeed discussed in the original issue[1], and I think it would
be a good feature request.

[1] https://issues.apache.org/jira/browse/FLINK-18202



Clemens Valiente <clemens.valie...@grab.com> 于2023年8月28日周一 11:41写道:
>
>
> There's friction with using scala/java protobuf and trying to convert them 
> into a Flink Table from a DataStream[ProtobufObject].
> Scenario:
> Input is a DataStream[ProtobufObject] from a kafka topic that we read data 
> from and deserialise into Protobuf objects (scala case classes or 
> alternatively Java classes) using scalapb https://scalapb.github.io/
>
> Goal: Given a topic name and a protobuf classname, we would like to 
> automatically generate a Flink Table for it.
>
> Problems:
>
> The Java Protobuf classes are not Pojos and therefore not recognised. They 
> show up as a single RAW column when converted from 
> streamTableEnv.fromDataStream()
> scala protobuf is better, the only issue is with repeated fields. They are 
> represented as Seq in scala, which does not map to a flink table type and 
> shows up as RAW again (only java.util.List types show as proper arrays)
> scalapb allows customising the collection type but the standard ones have the 
> same issue: 
> https://scalapb.github.io/docs/customizations/#custom-collection-types
> I tried to implement a new collection type that both satisfies the collection 
> type requirements from scalapb as well as that of a java.util.List. but 
> ultimately failed to do so because of signature clashes
> flink table api has a protobuf support 
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/protobuf/
>  but it requires translating the entire protobuf structure manually to SQL 
> type which is not feasible.
>
>  Questions:
> - are there plans to support scala Seq for Flink SQL Array Type? Would it be 
> straightforward for me to implement a custom typeinformation(?) to help Flink 
> Table API convert it correctly?
>
> - Why is the Java protobuf class not recognised as Pojo? Is it possible to 
> add support for them?
>
> - Why does a Flink `CREATE TABLE` from Protobuf require the entire table 
> column structure to be defined in SQL again? Shouldn't fields be inferred 
> automatically from the provided protobuf class?
> - Are there other ways of solving this challenge that maybe someone has 
> already successfully used?
>
> So far my workaround is to implement a custom .map() step to convert the pb 
> object into something readable by the Flink Table API. But that has to be 
> done manually for each individual topic and pb class which does not scale.
>
> Would be very glad for any insights to any of those questions above, I have 
> been hitting my head against this several prolonged times over the past 
> year(s) :(
>
> Thanks a lot
>
> Clemens
>
>
>
> By communicating with Grab Holdings Limited and/or its subsidiaries, 
> associate companies and jointly controlled entities (collectively, “Grab”), 
> you are deemed to have consented to the processing of your personal data as 
> set out in the Privacy Notice which can be viewed at https://grab.com/privacy/
>
>  This email contains confidential information that may be privileged and is 
> only for the intended recipient(s). If you are not the intended recipient(s), 
> please do not disseminate, distribute or copy this email. Please notify Grab 
> immediately if you have received this by mistake and delete this email from 
> your system. Email transmission may not be secure or error-free as any 
> information could be intercepted, corrupted, lost, destroyed, delayed or 
> incomplete, or contain viruses. Grab does not accept liability for any errors 
> or omissions in this email that arise as a result of email transmission. All 
> intellectual property rights in this email and any attachments shall remain 
> vested in Grab, unless otherwise provided by law



-- 

Best,
Benchao Li

Reply via email to