Matthias, For (2), how do you achieve this using transform()?
Thanks, Srikanth On Sat, May 21, 2016 at 9:10 AM, Matthias J. Sax <matth...@confluent.io> wrote: > Hi Srikanth, > > 1) there is no support on DSL level, but if you use Processor API you > can do "anything" you like. So yes, a map-like transform() that gets > initialized with the "broadcast-side" of the join should work. > > 2) Right now, there is no way to stall a stream -- a custom > TimestampExtractor will not do the tricker either, because Kafka Streams > allows for out-of-order/late arriving data -- thus, it processes what is > available without "waiting" for late data... > > Of course, you could build a custom solution via transfrom() again. > > > -Matthias > > On 05/21/2016 05:24 AM, Srikanth wrote: > > Hello, > > > > I'm writing a workflow using kafka streams where an incoming stream needs > > to be denormalized and joined with a few dimension table. > > It will be written back to another kafka topic. Fairly typical I believe. > > > > 1) Can I do broadcast join if my dimension table is small enough to be > held > > in each processor's local state? > > Not doing this will force KStream to be shuffled with an internal topic. > > If not should I write a transformer that reads the table in init() and > > cache's it. I can then do a map like transformation in transform() with > > cache lookup instead of a join. > > > > > > 2) When joining a KStream withe KTable for bigger tables, how do I stall > > reading KStream until KTable is completely materialized? > > I know completely read is a very loose term in stream processing :-) The > > KTable is going to be fairly static and needs the "current content" to be > > read completely before it can be used for joins. > > The only option for synchronizing streams I see is TimestampExtractor. I > > can't figure out how to use that because > > > > i) The join between KStream and KTable is time agnostic and doesn't > > really fit into a window operation. > > ii) TimestampExtractor is set as a stream config at global level. > > Timestamp extraction logic on the other hand will be specific to each > > stream. > > How does one write a generic extractor? > > > > Thanks, > > Srikanth > > > >