Hello Josh, As of now Kafka Streams does not yet support session windows as in the Dataflow model, though we do have near term plans to support it.
As for now you can still work around it with the Processor, by calling "KStream.transform()" function, which can still return you a stream object. In your customized "Transofmer" implementation, you can attach a state store of your own and access it in the "transform" function, and only return the results, for example, when one session has ended. As a concrete example, Confluent has some internal tools that uses Kafka Streams already for some online operations, where a sessioned window processor are needed as well. We use the "transform" function in the Streams DSL (i.e. "KStreamBuilder") in the following sketch: -------------- builder.addStateStore(/* new RocksDBKeyValueStoreSupplier(..)*/, "store-name"); stream1 = builder.stream("source-topic"); stream2.transform(MyTransformerFunc, "store-name"); -------------- then in MyTransformerFunc: public void init(ProcessorContext context) { this.kvStore = context.getStateStore("store-name"); // now you can access this store in the transform function. } -------------- Hope this helps. Guozhang On Tue, Mar 22, 2016 at 11:51 AM, josh gruenberg <jos...@gmail.com> wrote: > Hello there, > > I've been experimenting with the Kafka Streams preview, and I'm excited > about its features and capabilities! My team is enthusiastic about the > lightweight operational profile, and the support for local state is very > compelling. > > However, I'm having trouble designing a solution with KStreams to satisfy a > particular use-case: we want to "Sessionize" a stream of events, by > gathering together inputs that share a common identifier and occur without > a configurable interruption (gap) in event-time. > > This is achievable with other streaming frameworks (eg, using > Beam/Dataflow's "Session" windows, or SparkStreaming's mapWithState with > its "timeout" capability), but I don't see how to approach it with the > current Kafka Streams API. > > I've investigated using the aggregateWithKey function, but this doesn't > appear to support data-driven windowing. I've also considered using a > custom Processor to perform the aggregation, but don't see how to take an > output-stream from a Processor and continue to work with it. This area of > the system is undocumented, so I'm not sure how to proceed. > > Am I missing something? Do you have any suggestions? > > -josh > -- -- Guozhang