Re: Using HybridSource

Péter Váry Tue, 04 Jul 2023 22:28:51 -0700

Was it a conscious decision that HybridSource only accept Sources, and does
not allow mapping functions applied to them before combining them?


On Tue, Jul 4, 2023, 23:53 Ken Krugler <kkrugler_li...@transpac.com> wrote:

> Hi Oscar,
>
> Couldn’t you have both the Kafka and File sources return an Either<POJO
> from CSV File, Protobuf from Kafka>, and then (after the HybridSource) use
> a MapFunction to convert to the unified/correct type?
>
> — Ken
>
>
> On Jul 4, 2023, at 12:13 PM, Oscar Perez via user <user@flink.apache.org>
> wrote:
>
> Hei,
> 1) We populate state based on this CSV data and do business logic based on
> this state and events coming from other unrelated streams.
> 2) We are using low level process function in order to process this future
> hybrid source
>
> Regardless of the aforementioned points, please note that the main
> challenge is to combine in a hybridsource CSV and kafka topic that return
> different datatypes so I dont know how my answers relate to the original
> problem tbh. Regards,
> Oscar
>
> On Tue, 4 Jul 2023 at 20:53, Alexander Fedulov <
> alexander.fedu...@gmail.com> wrote:
>
>> @Oscar
>> 1. How do you plan to use that CSV data? Is it needed for lookup from the
>> "main" stream?
>> 2. Which API are you using? DataStream/SQL/Table or low level
>> ProcessFunction?
>>
>> Best,
>> Alex
>>
>>
>> On Tue, 4 Jul 2023 at 11:14, Oscar Perez via user <user@flink.apache.org>
>> wrote:
>>
>>> ok, but is it? As I said, both sources have different data types. In the
>>> example here:
>>>
>>>
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/
>>>
>>> We are using both sources as returning string but in our case, one
>>> source would return a protobuf event while the other would return a pojo.
>>> How can we make the 2 sources share the same datatype so that we can
>>> successfully use hybrid source?
>>>
>>> Regards,
>>> Oscar
>>>
>>> On Tue, 4 Jul 2023 at 12:04, Alexey Novakov <ale...@ververica.com>
>>> wrote:
>>>
>>>> Hi Oscar,
>>>>
>>>> You could use connected streams and put your file into a special Kafka
>>>> topic before starting such a job:
>>>> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/overview/#connect
>>>> But this may require more work and the event ordering (which is
>>>> shuffled) in the connected streams is probably not what you are looking 
>>>> for.
>>>>
>>>> I think HybridSource is the right solution.
>>>>
>>>> Best regards,
>>>> Alexey
>>>>
>>>> On Mon, Jul 3, 2023 at 3:44 PM Oscar Perez via user <
>>>> user@flink.apache.org> wrote:
>>>>
>>>>> Hei, We want to bootstrap some data from a CSV file before reading
>>>>> from a kafka topic that has a retention period of 7 days.
>>>>>
>>>>> We believe the best tool for that would be the HybridSource but the
>>>>> problem we are facing is that both datasources are of different nature. 
>>>>> The
>>>>> KafkaSource returns a protobuf event while the CSV is a POJO with just 3
>>>>> fields.
>>>>>
>>>>> We could hack the kafkasource implementation and then in the
>>>>> valuedeserializer do the mapping from protobuf to the CSV POJO but that
>>>>> seems rather hackish. Is there a way more elegant to unify both datatypes
>>>>> from both sources using Hybrid Source?
>>>>>
>>>>> thanks
>>>>> Oscar
>>>>>
>>>>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> Custom big data solutions
> Flink, Pinot, Solr, Elasticsearch
>
>
>
>

Re: Using HybridSource

Reply via email to