Hi Senthil,
         I think you are right that you cannot update closure variables 
directly and expect them to show up at the workers.

         If the variable values are read from S3 files, I think currently you 
will need to define a source explicitly to read the latest value of the file. 
Whether to use BroadcastedStream should depends on how you want to access the 
set of string: if you want to broadcast the same strings to all the tasks, then 
broadcast stream is the solution and if you want to distribute the set of 
strings in other methods, you could also use more generic connect streams like: 
 streamA.connect(streamB.keyBy()).process(xx). [1]

    Best,
     Yun

         [1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/#datastream-transformations



------------------------------------------------------------------
From:Senthil Kumar <senthi...@vmware.com>
Send Time:2020 Apr. 27 (Mon.) 21:51
To:user@flink.apache.org <user@flink.apache.org>
Subject:Updating Closure Variables

Hello Flink Community!

We have a flink streaming application with a particular use case where a 
closure variable Set<String> is used in a filter function.

Currently, the variable is set at startup time.

It’s populated from an S3 location, where several files exist (we consume the 
one with the last updated timestamp).

Is it possible to periodically update (say once every 24 hours) this closure 
variable?

My initial research indicates that we cannot update closure variables and 
expect them to show up at the workers.

There seems to be something called BrodcastStream in Flink. 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html

Is that the right approach? I would like some kind of a confirmation before I 
go deeper into it.

cheers
Kumar

Reply via email to