Re: Sharing Java Collections within Flink Cluster

Chakravarthy varaga Wed, 07 Sep 2016 10:55:19 -0700

certainly, what I thought as well...
The output of DataStream2 could be in 1000s and there are state updates...
reading this topic from the other job, job1, is okie.
However, assuming that we maintain this state into a collection, and
updating the state (by reading from the topic) in this collection, will
this be replicated across the cluster within this job1 ?




On Wed, Sep 7, 2016 at 6:33 PM, Fabian Hueske <fhue...@gmail.com> wrote:

> Is writing DataStream2 to a Kafka topic and reading it from the other job
> an option?
>
> 2016-09-07 19:07 GMT+02:00 Chakravarthy varaga <chakravarth...@gmail.com>:
>
>> Hi Fabian,
>>
>>     Thanks for your response. Apparently these DataStream
>> (Job1-DataStream1 & Job2-DataStream2) are from different flink applications
>> running within the same cluster.
>>     DataStream2 (from Job2) applies transformations and updates a 'cache'
>> on which (Job1) needs to work on.
>>     Our intention is to not use the external key/value store as we are
>> trying to localize the cache within the cluster.
>>     Is there a way?
>>
>> Best Regards
>> CVP
>>
>> On Wed, Sep 7, 2016 at 5:00 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Flink does not provide shared state.
>>> However, you can broadcast a stream to CoFlatMapFunction, such that each
>>> operator has its own local copy of the state.
>>>
>>> If that does not work for you because the state is too large and if it
>>> is possible to partition the state (and both streams), you can also use
>>> keyBy instead of broadcast.
>>>
>>> Finally, you can use an external system like a KeyValue Store or
>>> In-Memory store like Apache Ignite to hold your distributed collection.
>>>
>>> Best, Fabian
>>>
>>> 2016-09-07 17:49 GMT+02:00 Chakravarthy varaga <chakravarth...@gmail.com
>>> >:
>>>
>>>> Hi Team,
>>>>
>>>>      Can someone help me here? Appreciate any response !
>>>>
>>>> Best Regards
>>>> Varaga
>>>>
>>>> On Mon, Sep 5, 2016 at 4:51 PM, Chakravarthy varaga <
>>>> chakravarth...@gmail.com> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>>     I'm working on a Flink Streaming application. The data is injected
>>>>> through Kafka connectors. The payload volume is roughly 100K/sec. The 
>>>>> event
>>>>> payload is a string. Let's call this as DataStream1.
>>>>> This application also uses another DataStream, call it DataStream2,
>>>>> (consumes events off a kafka topic). The elements of this DataStream2
>>>>> involves in a certain transformation that finally updates a Hashmap(/Java
>>>>> util Collection). Apparently the flink application should share this
>>>>> HashMap across the flink cluster so that DataStream1 application could
>>>>> check the state of the values in this collection. Is there a way to do 
>>>>> this
>>>>> in Flink?
>>>>>
>>>>>     I don't see any Shared Collection used within the cluster?
>>>>>
>>>>> Best Regards
>>>>> CVP
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Sharing Java Collections within Flink Cluster

Reply via email to