Hi Matthias,

What we did was read the data from sink topic and print it to console. And
here's the raw data from that topic (the counts are randomized). As we can
see, the data is certainly missing for some time windows. For instance,
after 1493693760, the next timestamp for which the data is present
is 1493694300. That's around 9 minutes of data missing.

And this is just one instance. There are a lot of such instances in this
file.



On Sun, Apr 30, 2017 at 11:23 AM, Mahendra Kariya <
mahendra.kar...@go-jek.com> wrote:

> Thanks for the update Matthias! And sorry for the delayed response.
>
> The reason we use .aggregate() is because we want to count the number of
> unique values for a particular field in the message. So, we just add that
> particular field's value in the HashSet and then take the size of the
> HashSet.
>
> On our side, we are also investigating and it looks like there might be a
> bug somewhere in our codebase. If that's the case, then it's quite possible
> that there is no bug in Kafka Streams, except the metric one.
>
> We will revert after confirming.
>
>
>
>
> On Sun, Apr 30, 2017 at 10:39 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> Just a follow up (we identified a bug in the "skipped records" metric).
>> The reported value is not correct.
>>
>>
>> On 4/28/17 9:12 PM, Matthias J. Sax wrote:
>> > Ok. That makes sense.
>> >
>> > Question: why do you use .aggregate() instead of .count() ?
>> >
>> > Also, can you share the code of you AggregatorFunction()? Did you change
>> > any default setting of StreamsConfig?
>> >
>> > I have still no idea what could go wrong. Maybe you can run with log
>> > level TRACE? Maybe we can get some insight from those.
>> >
>> >
>> > -Matthias
>> >
>> > On 4/27/17 11:41 PM, Mahendra Kariya wrote:
>> >> Oh good point!
>> >>
>> >> The reason why there is only one row corresponding to each time window
>> is
>> >> because it only contains the latest value for the time window. So what
>> we
>> >> did was we just dumped the data present in the sink topic to a db
>> using an
>> >> upsert query. The primary key of the table was time window. The file
>> that I
>> >> attached is actually the data present in the DB. And we know that
>> there is
>> >> no bug in our db dump code because we have been using it for a long
>> time in
>> >> production without any issues.
>> >>
>> >> The reason the count is zero for some time windows is because I
>> subtracted
>> >> a random number the actual values and rounded it off to zero; for
>> privacy
>> >> reason. The actual data doesn't have any zero values. I should have
>> >> mentioned this earlier. My bad!
>> >>
>> >> The stream topology code looks something like this.
>> >>
>> >> stream
>> >>     .filter()
>> >>     .map((key, value) -> new KeyValue<>(transform(key), value)
>> >>     .groupByKey()
>> >>     .aggregate(HashSet::new, AggregatorFunction(),
>> >> TimeWindows.of(60000).until(3600000))
>> >>     .mapValues(HashSet::size)
>> >>     .toStream()
>> >>     .map((key, value) -> convertToProtobufObject(key, value))
>> >>     .to()
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Apr 28, 2017 at 1:13 PM, Matthias J. Sax <
>> matth...@confluent.io>
>> >> wrote:
>> >>
>> >>> Thanks for the details (sorry that I forgot that you did share the
>> >>> output already).
>> >>>
>> >>> Might be a dumb question, but what is the count for missing windows in
>> >>> your seconds implementation?
>> >>>
>> >>> If there is no data for a window, it should not emit a window with
>> count
>> >>> zero, but nothing.
>> >>>
>> >>> Thus, looking at your output, I am wondering how it could contain line
>> >>> like:
>> >>>
>> >>>> 2017-04-27T04:53:00 0
>> >>>
>> >>> I am also wondering why your output only contains a single value per
>> >>> window. As Streams outputs multiple updates per window while the count
>> >>> is increasing, you should actually see multiple records per window.
>> >>>
>> >>> Your code is like this:
>> >>>
>> >>> stream.filter().groupByKey().count(TimeWindow.of(60000)).to();
>> >>>
>> >>> Or do you have something more complex?
>> >>>
>> >>>
>> >>> -Matthias
>> >>>
>> >>>
>> >>> On 4/27/17 9:16 PM, Mahendra Kariya wrote:
>> >>>>> Can you somehow verify your output?
>> >>>>
>> >>>>
>> >>>> Do you mean the Kafka streams output? In the Kafka Streams output,
>> we do
>> >>>> see some missing values. I have attached the Kafka Streams output
>> (for a
>> >>>> few hours) in the very first email of this thread for reference.
>> >>>>
>> >>>> Let me also summarise what we have done so far.
>> >>>>
>> >>>> We took a dump of the raw data present in the source topic. We wrote
>> a
>> >>>> script to read this data and do the exact same aggregations that we
>> do
>> >>>> using Kafka Streams. And then we compared the output from Kafka
>> Streams
>> >>> and
>> >>>> our script.
>> >>>>
>> >>>> The difference that we observed in the two outputs is that there
>> were a
>> >>> few
>> >>>> rows (corresponding to some time windows) missing in the Streams
>> output.
>> >>>> For the time windows for which the data was present, the aggregated
>> >>> numbers
>> >>>> matched exactly.
>> >>>>
>> >>>> This means, either all the records for a particular time window are
>> being
>> >>>> skipped, or none. Now this is highly unlikely to happen. Maybe there
>> is a
>> >>>> bug somewhere in the rocksdb state stores? Just a speculation, not
>> sure
>> >>>> though. And there could even be a bug in the reported metric.
>> >>>>
>> >>>
>> >>>
>> >>
>> >
>>
>>
>
1493693760 55
1493693760 58
1493693760 127
1493693760 269
1493693760 94
1493693760 96
1493693760 24
1493693760 219
1493693760 272
1493693760 42
1493693760 225
1493693760 151
1493693760 85
1493694300 100
1493694300 154
1493694300 142
1493694300 174
1493694300 106
1493694300 93
1493694300 104
1493694300 52
1493694300 255
1493694300 80
1493694300 225
1493694300 37
1493694300 6
1493695140 90
1493695140 208
1493695140 199
1493695140 214
1493695140 177
1493695140 199
1493695140 160
1493695140 28
1493695140 136
1493695140 72
1493695140 131
1493695140 179
1493695140 26
1493695680 40
1493695680 130
1493695680 126
1493695680 25
1493695680 179
1493695680 161
1493695680 69
1493695680 132
1493695680 100
1493695680 52
1493695680 14
1493695680 145
1493695680 50
1493695980 151
1493695980 94
1493695980 38
1493695980 12
1493695980 121
1493695980 103
1493695980 4
1493695980 219
1493695980 211
1493695980 8
1493695980 205
1493695980 99
1493695980 55
1493696640 115
1493696640 43
1493696640 12
1493696640 174
1493696640 158
1493696640 2
1493696640 124
1493696640 113
1493696640 39
1493696640 178
1493696640 7
1493696640 13
1493696640 3
1493696700 13
1493696700 127
1493696700 19
1493696700 138
1493696700 125
1493696700 139
1493696700 20
1493696700 20
1493696700 77
1493696700 191
1493696700 221
1493696700 19
1493696700 53
1493696940 153
1493696940 120
1493696940 206
1493696940 72
1493696940 70
1493696940 65
1493696940 233
1493696940 238
1493696940 209
1493696940 134
1493696940 96
1493696940 155
1493696940 95
1493697180 169
1493697180 146
1493697180 51
1493697180 199
1493697180 7
1493697180 22
1493697180 92
1493697180 188
1493697180 36
1493697180 174
1493697180 132
1493697180 118
1493697180 30
1493697240 34
1493697240 184
1493697240 19
1493697240 61
1493697240 131
1493697240 180
1493697240 101
1493697240 168
1493697240 64
1493697240 54
1493697240 84
1493697300 171
1493697300 18
1493697300 158
1493697300 20
1493697300 61
1493697300 65
1493697300 154
1493697300 173
1493697300 158
1493697300 162
1493697300 70
1493697300 45
1493697300 89
1493697360 46
1493697360 104
1493697360 24
1493697360 58
1493697360 119
1493697360 197
1493697360 128
1493697360 16
1493697360 145
1493697360 222
1493697360 10
1493697360 15
1493697360 32
1493697720 3
1493697720 84
1493697720 35
1493697720 139
1493697720 133
1493697720 96
1493697720 210
1493697720 222
1493697720 58
1493697720 179
1493697720 113
1493697720 174
1493697720 15
1493697840 20
1493697840 10
1493697840 137
1493697840 110
1493697840 183
1493697840 90
1493697840 235
1493697840 46
1493697840 127
1493697840 24
1493697840 204
1493697840 241
1493697840 169
1493698080 118
1493698080 70
1493698080 94
1493698080 184
1493698080 196
1493698080 130
1493698080 190
1493698080 189
1493698080 79
1493698080 101
1493698080 20
1493698080 259
1493698080 33
1493698200 60
1493698200 85
1493698200 73
1493698200 105
1493698200 185
1493698200 189
1493698200 35
1493698200 57
1493698200 125
1493698200 221
1493698200 21
1493698200 56
1493698200 28
1493698380 111
1493698380 205
1493698380 200
1493698380 162
1493698380 198
1493698380 28
1493698380 206
1493698380 115
1493698380 26
1493698380 36
1493698380 74
1493698380 39
1493698380 27
1493699340 176
1493699340 20
1493699340 27
1493699340 76
1493699340 243
1493699340 244
1493699340 240
1493699340 19
1493699340 241
1493699340 271
1493699340 170
1493699340 10
1493699340 12
1493699400 76
1493699400 45
1493699400 212
1493699400 10
1493699400 118
1493699400 158
1493699400 202
1493699400 9
1493699400 162
1493699400 106
1493699400 131
1493699400 23
1493699460 95
1493699460 96
1493699460 155
1493699460 101
1493699460 55
1493699460 34
1493699460 207
1493699460 155
1493699460 11
1493699460 8
1493699460 147
1493699460 237
1493699460 59
1493700120 43
1493700120 173
1493700120 150
1493700120 55
1493700120 94
1493700120 68
1493700120 121
1493700120 85
1493700120 75
1493700120 168
1493700120 30
1493700120 218
1493700120 176
1493700360 29
1493700360 67
1493700360 250
1493700360 91
1493700360 259
1493700360 80
1493700360 161
1493700360 221
1493700360 7
1493700360 95
1493700360 90
1493700360 219
1493700360 178
1493700480 9
1493700480 118
1493700480 101
1493700480 26
1493700480 85
1493700480 220
1493700480 123
1493700480 207
1493700480 138
1493700480 126
1493700480 262
1493700480 83
1493700480 245
1493702100 48
1493702100 87
1493702100 186
1493702100 9
1493702100 64
1493702100 222
1493702100 61
1493702100 172
1493702100 129
1493702100 55
1493702100 143
1493702100 42
1493702100 52
1493702400 86
1493702400 51
1493702400 34
1493702400 14
1493702400 160
1493702400 209
1493702400 37
1493702400 103
1493702400 54
1493702400 107
1493702400 154
1493702400 160
1493702400 17

Reply via email to