I think what you're describing could be handled in KStreams by a "global" KTable. This functionality is currently being discussed/voted on in a KIP discussion: https://cwiki.apache.org/confluence/pages/viewpa ge.action?pageId=67633649 The list of interests would be a global KTable (shared globally across all streams instances) and each instance would filter/categorize based on that table via a join.
-Ewen On Fri, Dec 30, 2016 at 3:02 PM, Matt King <kyrri...@gmail.com> wrote: > I'd like to have the following: > > One large stream of content coming through a topic, with Kafka Stream > filtering to identify records of interest. I can see how this would be > sharded to allow scale out to handle a large stream of content. > > I would like to have a 2nd, smaller, topic to define the areas of interest > that would be used for the filtering. This topic should be available to > all the stream processing filters. When a new filter comes up it should be > able to recreate its state. As area of interest definitions change these > changes should also go out to all the filtering applications. > > Can this be done directly with Kafka Streams? I can get the primary stream > working with a static set of interests and the filtering works fine. But > adding in a second input stream I'm having trouble. There doesn't seem to > be a way to have the same topic/partition go to all the applications? > Alternatively I could imagine broadcasting the interests to multiple > partitions but don't see how that is done. > > Perhaps the area-of-interest topic should be done using a plain old Kafka > producer, sending it to all partitions? > > Am I making sense? > > Happy New Year > > Matt >