Hi Jamie, Thanks a lot for the response. Appreciate your help.
Regards, Govind On Mon, Sep 26, 2016 at 3:26 AM, Jamie Grier <ja...@data-artisans.com> wrote: > Hi Govindarajan, > > Typically the way people do this is to create a stream of configuration > changes and consume this like any other stream. For the specific case of > filtering for example you may have a data stream and a stream of filters > that you want to run the data through. The typically approach in the Flink > API would then be > > val dataStream = env.addSource(dataSource).keyBy("userId")val > filterStream = env.addSource(filterSource).keyBy("userId") > val connectedStream = dataStream > .connect(filterStream) > .flatMap(yourFilterFunction) > > > You would maintain your filters as state in your filter function. Notice > that in this example both streams are keyed the same way. > > If it is not possible to distribute the configuration by key (it really > depends on your use case) you can instead "broadcast" that state so that > each instance of yourFilterFunction sees the same configuration messages > and will end up building the same state. For example: > > val dataStream = env.addSource(dataSource).keyBy("userId")val > filterStream = env.addSource(filterSource).broadcast() > val connectedStream = dataStream > .connect(filterStream) > .flatMap(yourFilterFunction) > > > I hope that helps. > > -Jamie > > > > > On Mon, Sep 26, 2016 at 4:34 AM, Govindarajan Srinivasaraghavan < > govindragh...@gmail.com> wrote: > > > Hi, > > > > My requirement is to stream millions of records in a day and it has huge > > dependency on external configuration parameters. For example, a user can > go > > and change the required setting anytime in the web application and after > > the change is made, the streaming has to happen with the new application > > config parameters. These are app level configurations and we also have > some > > dynamic exclude parameters which each data has to be passed through and > > filtered. > > > > I see that flink doesn’t have global state which is shared across all > task > > managers and subtasks. Having a centralized cache is an option but for > each > > parameter I would have to read it from cache which will increase the > > latency. Please advise on the better approach to handle these kind of > > scenarios and how other applications are handling it. Thanks. > > > > > > -- > > Jamie Grier > data Artisans, Director of Applications Engineering > @jamiegrier <https://twitter.com/jamiegrier> > ja...@data-artisans.com >