Ya, there will not be a problem of duplicates. But what I'm trying to
achieve is if there a large static state which needs to be present just one
per node rather than storing it per slot that would be ideal. The reason
being is that the state is quite large around 100GB of mostly static data
and it is not needed at per slot level. It can be at per instance level
where each slot can read from this shared memory.

Thanks

On Wed, Oct 9, 2019 at 12:13 AM Congxian Qiu <qcx978132...@gmail.com> wrote:

> Hi,
>
> After using Redis, why there need to care about eliminate duplicated data,
> if you specify the same key, then Redis will do the deduplicate things.
>
> Best,
> Congxian
>
>
> Fabian Hueske <fhue...@gmail.com> 于2019年10月2日周三 下午5:30写道:
>
>> Hi,
>>
>> State is always associated with a single task in Flink.
>> The state of a task cannot be accessed by other tasks of the same
>> operator or tasks of other operators.
>> This is true for every type of state, including broadcast state.
>>
>> Best, Fabian
>>
>>
>> Am Di., 1. Okt. 2019 um 08:22 Uhr schrieb Navneeth Krishnan <
>> reachnavnee...@gmail.com>:
>>
>>> Hi,
>>>
>>> I can use redis but I’m still having hard time figuring out how I can
>>> eliminate duplicate data. Today without broadcast state in 1.4 I’m using
>>> cache to lazy load the data. I thought the broadcast state will be similar
>>> to that of kafka streams where I have read access to the state across the
>>> pipeline. That will indeed solve a lot of problems. Is there some way I can
>>> do the same with flink?
>>>
>>> Thanks!
>>>
>>> On Mon, Sep 30, 2019 at 10:36 PM Congxian Qiu <qcx978132...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Could you use some cache system such as HBase or Reids to storage this
>>>> data, and query from the cache if needed?
>>>>
>>>> Best,
>>>> Congxian
>>>>
>>>>
>>>> Navneeth Krishnan <reachnavnee...@gmail.com> 于2019年10月1日周二 上午10:15写道:
>>>>
>>>>> Thanks Oytun. The problem with doing that is the same data will be
>>>>> have to be stored multiple times wasting memory. In my case there will
>>>>> around million entries which needs to be used by at least two operators 
>>>>> for
>>>>> now.
>>>>>
>>>>> Thanks
>>>>>
>>>>> On Mon, Sep 30, 2019 at 5:42 PM Oytun Tez <oy...@motaword.com> wrote:
>>>>>
>>>>>> This is how we currently use broadcast state. Our states are
>>>>>> re-usable (code-wise), every operator that wants to consume basically 
>>>>>> keeps
>>>>>> the same descriptor state locally by processBroadcastElement'ing into a
>>>>>> local state.
>>>>>>
>>>>>> I am open to suggestions. I see this as a hard drawback of dataflow
>>>>>> programming or Flink framework?
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Oytun Tez
>>>>>>
>>>>>> *M O T A W O R D*
>>>>>> The World's Fastest Human Translation Platform.
>>>>>> oy...@motaword.com — www.motaword.com
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 30, 2019 at 8:40 PM Oytun Tez <oy...@motaword.com> wrote:
>>>>>>
>>>>>>> You can re-use the broadcasted state (along with its descriptor)
>>>>>>> that comes into your KeyedBroadcastProcessFunction, in another operator
>>>>>>> downstream. that's basically duplicating the broadcasted state whichever
>>>>>>> operator you want to use, every time.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>> Oytun Tez
>>>>>>>
>>>>>>> *M O T A W O R D*
>>>>>>> The World's Fastest Human Translation Platform.
>>>>>>> oy...@motaword.com — www.motaword.com
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 30, 2019 at 8:29 PM Navneeth Krishnan <
>>>>>>> reachnavnee...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> Is it possible to access a broadcast state across the pipeline? For
>>>>>>>> example, say I have a KeyedBroadcastProcessFunction which adds the 
>>>>>>>> incoming
>>>>>>>> data to state and I have downstream operator where I need the same 
>>>>>>>> state as
>>>>>>>> well, would I be able to just read the broadcast state with a readonly
>>>>>>>> view. I know this is possible in kafka streams.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>

Reply via email to