Re: Using HybridSource

Ken Krugler Tue, 04 Jul 2023 14:53:40 -0700

Hi Oscar,

Couldn’t you have both the Kafka and File sources return an Either<POJO from 
CSV File, Protobuf from Kafka>, and then (after the HybridSource) use a 
MapFunction to convert to the unified/correct type?


— Ken


> On Jul 4, 2023, at 12:13 PM, Oscar Perez via user <user@flink.apache.org> 
> wrote:
> 
> Hei,
> 1) We populate state based on this CSV data and do business logic based on 
> this state and events coming from other unrelated streams.
> 2) We are using low level process function in order to process this future 
> hybrid source
> 
> Regardless of the aforementioned points, please note that the main challenge 
> is to combine in a hybridsource CSV and kafka topic that return different 
> datatypes so I dont know how my answers relate to the original problem tbh. 
> Regards,
> Oscar
> 
> On Tue, 4 Jul 2023 at 20:53, Alexander Fedulov <alexander.fedu...@gmail.com 
> <mailto:alexander.fedu...@gmail.com>> wrote:
> @Oscar <>
> 1. How do you plan to use that CSV data? Is it needed for lookup from the 
> "main" stream?
> 2. Which API are you using? DataStream/SQL/Table or low level ProcessFunction?
> 
> Best,
> Alex
> 
>  <>
> On Tue, 4 Jul 2023 at 11:14, Oscar Perez via user <user@flink.apache.org 
> <mailto:user@flink.apache.org>> wrote:
> ok, but is it? As I said, both sources have different data types. In the 
> example here:
> 
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/
>  
> <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/>
> 
> We are using both sources as returning string but in our case, one source 
> would return a protobuf event while the other would return a pojo. How can we 
> make the 2 sources share the same datatype so that we can successfully use 
> hybrid source?
> 
> Regards,
> Oscar
> 
> On Tue, 4 Jul 2023 at 12:04, Alexey Novakov <ale...@ververica.com 
> <mailto:ale...@ververica.com>> wrote:
> Hi Oscar,
> 
> You could use connected streams and put your file into a special Kafka topic 
> before starting such a job: 
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/overview/#connect
>  
> <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/overview/#connect>
> But this may require more work and the event ordering (which is shuffled) in 
> the connected streams is probably not what you are looking for.
> 
> I think HybridSource is the right solution.
> 
> Best regards,
> Alexey
> 
> On Mon, Jul 3, 2023 at 3:44 PM Oscar Perez via user <user@flink.apache.org 
> <mailto:user@flink.apache.org>> wrote:
> Hei, We want to bootstrap some data from a CSV file before reading from a 
> kafka topic that has a retention period of 7 days.
> 
> We believe the best tool for that would be the HybridSource but the problem 
> we are facing is that both datasources are of different nature. The 
> KafkaSource returns a protobuf event while the CSV is a POJO with just 3 
> fields.
> 
> We could hack the kafkasource implementation and then in the 
> valuedeserializer do the mapping from protobuf to the CSV POJO but that seems 
> rather hackish. Is there a way more elegant to unify both datatypes from both 
> sources using Hybrid Source?
> 
> thanks
> Oscar

--------------------------
Ken Krugler
http://www.scaleunlimited.com
Custom big data solutions
Flink, Pinot, Solr, Elasticsearch

Re: Using HybridSource

Reply via email to