Re: Kafka Streams - groupByKey and Count, null result on join

2017-08-31 Thread Guozhang Wang
Ian, You can try with the `toString` function (assuming you're on the older version) of KafkaStreams to print the constructed topology and check if multiple repartition topics is created. >From your code snippet, it is a bit hard to tell, since I do not know if repeatedInputStream is already from

Re: Kafka Streams - groupByKey and Count, null result on join

2017-08-31 Thread Ian Duffy
Hi Guozhang, Looking at this again this again I'm a little confused, I'm not using any maps and as far as I know, the selectKey is already causing a repartition: "base-topic-KSTREAM-KEY-SELECT-03-repartition". Is using the same KTable in multiple different leftJoins going to be an issue?

Re: Kafka Streams - groupByKey and Count, null result on join

2017-08-29 Thread Guozhang Wang
Ian, if your issue is indeed due to KAFKA-4601, currently the best way would be what I mentioned in that ticket, i.e. manually call `through` to do the repartition, and then from the repartition topic do the aggregation first followed by the join. It will enforce that for each incoming record, it w

Re: Kafka Streams - groupByKey and Count, null result on join

2017-08-29 Thread Ian Duffy
Thanks for the information Guozhang. Any recommendations for handling or working around this? It's making tests very flakey. On 24 August 2017 at 23:48, Guozhang Wang wrote: > Hi Ian, > > I suspect it has something to do with your specified topology, in which it > triggers the join first, then

Re: Kafka Streams - groupByKey and Count, null result on join

2017-08-24 Thread Guozhang Wang
Hi Ian, I suspect it has something to do with your specified topology, in which it triggers the join first, then the aggregation updates. For example, take a look at this ticket: https://issues.apache.org/jira/browse/KAFKA-4601 As from its printed topology, due to the repartition topic the join

Kafka Streams - groupByKey and Count, null result on join

2017-08-24 Thread Ian Duffy
Hi All, I'm building a streams applications where I wish to take action on the input when a certain frequency of the input has been seen. At the moment the application roughly goes: frequency table = Input Stream -> groupByKey -> Count input stream with counts = leftJoin frequency table and inp