How do you know that the result should be 900,000 messages? Did you
verify that the keys match and that the timestamps are correct?

Did you try to remove grace-period or set a higher value? Maybe there is
an issue with ouf-of-order data?

-Matthias

On 6/14/19 5:05 AM, giselle.vandon...@klarrio.com wrote:
> I have two streams of data flowing into a Kafka cluster. I want to process 
> this data with Kafka streams. The stream producers are started at some time t.
> 
> I start up the Kafka Streams job 5 minutes later and start reading from 
> earliest from both topics (about 900 000 messages already on each topic). The 
> job parses the data and joins the two streams. On the intermediate topics, I 
> see that all the earlier 2x900000 events are flowing through until the join. 
> However, only 250 000 are outputted from the join, instead of 900 000. 
> 
> After processing the burst, the code works perfect on the new incoming events.
> 
> Grace ms of the join is put on 50 millis but putting it on 5 minutes or 10 
> minutes makes no difference. My other settings:
> 
>     auto.offset.reset = earliest
>     commit.interval.ms = 1000
>     batch.size = 204800
>     max.poll.records = 500000
> 
> The join is on the following window:
> 
> JoinWindows.of(Duration.ofMillis(1000))
>       .grace(Duration.ofMillis(50))
> 
> As far as I understand these joins should be done on event time. What that 
> can cause these results to be discarded?
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to