Matthias,

For (2), how do you achieve this using transform()?

Thanks,
Srikanth

On Sat, May 21, 2016 at 9:10 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> Hi Srikanth,
>
> 1) there is no support on DSL level, but if you use Processor API you
> can do "anything" you like. So yes, a map-like transform() that gets
> initialized with the "broadcast-side" of the join should work.
>
> 2) Right now, there is no way to stall a stream -- a custom
> TimestampExtractor will not do the tricker either, because Kafka Streams
> allows for out-of-order/late arriving data -- thus, it processes what is
> available without "waiting" for late data...
>
> Of course, you could build a custom solution via transfrom() again.
>
>
> -Matthias
>
> On 05/21/2016 05:24 AM, Srikanth wrote:
> > Hello,
> >
> > I'm writing a workflow using kafka streams where an incoming stream needs
> > to be denormalized and joined with a few dimension table.
> > It will be written back to another kafka topic. Fairly typical I believe.
> >
> > 1) Can I do broadcast join if my dimension table is small enough to be
> held
> > in each processor's local state?
> > Not doing this will force KStream to be shuffled with an internal topic.
> > If not should I write a transformer that reads the table in init() and
> > cache's it. I can then do a map like transformation in transform() with
> > cache lookup instead of a join.
> >
> >
> > 2) When joining a KStream withe KTable for bigger tables, how do I stall
> > reading KStream until KTable is completely materialized?
> > I know completely read is a very loose term in stream processing :-) The
> > KTable is going to be fairly static and needs the "current content" to be
> > read completely before it can be used for joins.
> > The only option for synchronizing streams I see is TimestampExtractor. I
> > can't figure out how to use that because
> >
> >   i) The join between KStream and KTable is time agnostic and doesn't
> > really fit into a window operation.
> >   ii) TimestampExtractor is set as a stream config at global level.
> > Timestamp extraction logic on the other hand will be specific to each
> > stream.
> >       How does one write a generic extractor?
> >
> > Thanks,
> > Srikanth
> >
>
>

Reply via email to