Thanks Till I will defiantly going to check it. just to make sure that I
got you correctly. you are suggesting the the list that I want to broadcast
will be broadcasted via control stream and it will be than be kept in the
relevant operator state correct ? and updates (CRUD) on that list will be
preformed via the control stream. correct ?
BR
Avi

On Wed, Jan 2, 2019 at 4:28 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Avi,
>
> you could use Flink's broadcast state pattern [1]. You would need to use
> the DataStream API but it allows you to have two streams (input and control
> stream) where the control stream is broadcasted to all sub tasks. So by
> ingesting messages into the control stream you can send model updates to
> all sub tasks.
>
> [1]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_dev_stream_state_broadcast-5Fstate.html&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=dpWtkT5FJRWFqDA3MAnB4-dRYGDQjgfQTYAocqGkRKo&m=u5UQh821Gau2wZ7S3M8IRmVpL5JxGADJaq_k7iq6sYo&s=uITdFlQPKLbqxkTux4nR21JhUpLIkS5Pdfi9D_ZSUwE&e=>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_dev_stream_state_broadcast-5Fstate.html&d=DwQFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=dpWtkT5FJRWFqDA3MAnB4-dRYGDQjgfQTYAocqGkRKo&m=u5UQh821Gau2wZ7S3M8IRmVpL5JxGADJaq_k7iq6sYo&s=uITdFlQPKLbqxkTux4nR21JhUpLIkS5Pdfi9D_ZSUwE&e=>
>
> Cheers,
> Till
>
> On Tue, Jan 1, 2019 at 6:49 PM miki haiat <miko5...@gmail.com> wrote:
>
>> Im trying to understand  your  use case.
>> What is the source  of the data ? FS ,KAFKA else ?
>>
>>
>> On Tue, Jan 1, 2019 at 6:29 PM Avi Levi <avi.l...@bluevoyant.com> wrote:
>>
>>> Hi,
>>> I have a list (couple of thousands text lines) that I need to use in my
>>> map function. I read this article about broadcasting variables
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_dev_batch_-23broadcast-2Dvariables&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=dpWtkT5FJRWFqDA3MAnB4-dRYGDQjgfQTYAocqGkRKo&m=u5UQh821Gau2wZ7S3M8IRmVpL5JxGADJaq_k7iq6sYo&s=U3vGeHdL9fGDfP0GNZUkGpSlcVLz9CNLg2MXNwHP0_M&e=>
>>>  or
>>> using distributed cache
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__ci.apache.org_projects_flink_flink-2Ddocs-2Dstable_dev_batch_-23distributed-2Dcache&d=DwMFaQ&c=euGZstcaTDllvimEN8b7jXrwqOf-v5A_CdpgnVfiiMM&r=dpWtkT5FJRWFqDA3MAnB4-dRYGDQjgfQTYAocqGkRKo&m=u5UQh821Gau2wZ7S3M8IRmVpL5JxGADJaq_k7iq6sYo&s=m5IHbX1Dbz7AYERvVgyxKXmrUQQ06IkA4VCDllkR0HM&e=>
>>> however I need to update this list from time to time, and if I understood
>>> correctly it is not possible on broadcast or cache without restarting the
>>> job. Is there idiomatic way to achieve this? A db seems to be an overkill
>>> for that and I do want to be cheap on io/network calls as much as possible.
>>>
>>> Cheers
>>> Avi
>>>
>>>

Reply via email to