Thanks for the details (sorry that I forgot that you did share the
output already).

Might be a dumb question, but what is the count for missing windows in
your seconds implementation?

If there is no data for a window, it should not emit a window with count
zero, but nothing.

Thus, looking at your output, I am wondering how it could contain line like:

> 2017-04-27T04:53:00 0

I am also wondering why your output only contains a single value per
window. As Streams outputs multiple updates per window while the count
is increasing, you should actually see multiple records per window.

Your code is like this:

stream.filter().groupByKey().count(TimeWindow.of(60000)).to();

Or do you have something more complex?


-Matthias


On 4/27/17 9:16 PM, Mahendra Kariya wrote:
>> Can you somehow verify your output?
> 
> 
> Do you mean the Kafka streams output? In the Kafka Streams output, we do
> see some missing values. I have attached the Kafka Streams output (for a
> few hours) in the very first email of this thread for reference.
> 
> Let me also summarise what we have done so far.
> 
> We took a dump of the raw data present in the source topic. We wrote a
> script to read this data and do the exact same aggregations that we do
> using Kafka Streams. And then we compared the output from Kafka Streams and
> our script.
> 
> The difference that we observed in the two outputs is that there were a few
> rows (corresponding to some time windows) missing in the Streams output.
> For the time windows for which the data was present, the aggregated numbers
> matched exactly.
> 
> This means, either all the records for a particular time window are being
> skipped, or none. Now this is highly unlikely to happen. Maybe there is a
> bug somewhere in the rocksdb state stores? Just a speculation, not sure
> though. And there could even be a bug in the reported metric.
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to