Re: Kafka stream join scenarios

2016-05-27 Thread Guozhang Wang
The timestamp is not only used for windowing specs but also for flow control (i.e. it is used a way of "message chooser" among multiple input topic partitions), see this section for details: http://docs.confluent.io/3.0.0/streams/architecture.html#flow-control-with-timestamps Guozhang On Fri, M

Re: Kafka stream join scenarios

2016-05-27 Thread Srikanth
Guozhang, Timestamp extraction seems more like a stream level API. I guess its a better fit as a global options when using WallclockTimestampExtractor or ConsumerRecordTimestampExtractor. w.r.t your statement -- "I think setting timestamps for this KTable to make sure its values is smaller than t

Re: Kafka stream join scenarios

2016-05-25 Thread Guozhang Wang
A processor is guaranteed to be executed on the same thread at any given time, its process() and punctuate() will always be triggered to run in a single thread. Currently TimestampExtractor is set globally, but you can definitely define different logics depending on the topic name (which is includ

Re: Kafka stream join scenarios

2016-05-23 Thread Srikanth
Guozhang, I guess you are referring to a scenario where noOfThreads < totalNoOfTasks. We could have KTable task and KStream task running on the same thread and sleep will be counter productive? On this note, will a Processor always run on the same thread? Are process() and punctuate() guaranteed t

Re: Kafka stream join scenarios

2016-05-23 Thread Guozhang Wang
Srikanth, Note that the same thread maybe used for fetching both the "semi-static" KTable stream as well as the continuous KStream stream, so sleep-on-no-match may not work. I think setting timestamps for this KTable to make sure its values is smaller than the KStream stream will work, and there

Re: Kafka stream join scenarios

2016-05-23 Thread Srikanth
Thanks Guozhang & Matthias! For 1), it is going to be a common ask. So a DSL API will be good. For 2), source for the KTable currently is a file. Think about it as a dump from a DB table. We are thinking of ways to stream updates from this table. But for now its a new file every day or so. I plan

Re: Kafka stream join scenarios

2016-05-23 Thread Matthias J. Sax
Hi Srikanth, as Guozhang mentioned, the problem is the definition of the time, when your table is read for joining with the stream. Using transform() you would basically read a changlog-stream within your custom Transformer class and apply it via KStream.transform() to your regular stream. (ie, y

Re: Kafka stream join scenarios

2016-05-23 Thread Guozhang Wang
Hi Srikanth, How do you define if the "KTable is read completely" to get to the "current content"? Since as you said that table is not purely static, but still with maybe-low-traffic update streams, I guess "catch up to current content" is still depending on some timestamp? BTW about 1), we are c

Re: Kafka stream join scenarios

2016-05-23 Thread Srikanth
Matthias, For (2), how do you achieve this using transform()? Thanks, Srikanth On Sat, May 21, 2016 at 9:10 AM, Matthias J. Sax wrote: > Hi Srikanth, > > 1) there is no support on DSL level, but if you use Processor API you > can do "anything" you like. So yes, a map-like transform() that gets

Re: Kafka stream join scenarios

2016-05-21 Thread Matthias J. Sax
Hi Srikanth, 1) there is no support on DSL level, but if you use Processor API you can do "anything" you like. So yes, a map-like transform() that gets initialized with the "broadcast-side" of the join should work. 2) Right now, there is no way to stall a stream -- a custom TimestampExtractor wil

Kafka stream join scenarios

2016-05-20 Thread Srikanth
Hello, I'm writing a workflow using kafka streams where an incoming stream needs to be denormalized and joined with a few dimension table. It will be written back to another kafka topic. Fairly typical I believe. 1) Can I do broadcast join if my dimension table is small enough to be held in each