Hey everyone, We are running Kafka streams ver 2.1.0 (Scala). The specific use-case I am talking about can be simplified into the following stream: *given:* inputStream of type [String, BigDecimal] (for example: ("1234", 120.67)) inputStream.map(createUniqueKey).groupByKey.reduce(_ + _).toStream.to("daily_counts")
So, at the end of the stream, we are getting a compacted topic named "daily_counts" that should hold all the "counts" of specific keys. When consuming from the topic we will get something like: "unique_key_for_1234" -> 110.67 "unique_key_for_1234" -> 120.67 "unique_key_for_1234" -> 130.67 "unique_key_for_1234" -> 140.67 "unique_key_for_1234" -> 150.67 Where the value that we actually use is the last one 150.67. 99% percent of the time, things works as expected - but I have noticed something strange that keeps happening to us every couple of days: When consuming form the topic we get: CreateTime:1543291897088 some_key_prefix_day_of_year_id 429.07 CreateTime:1543291913519 some_key_prefix_day_of_year_id 429.55 CreateTime:1543291937106 some_key_prefix_day_of_year_id 430.01 CreateTime:1543291936980 some_key_prefix_day_of_year_id 430.49 CreateTime:1543291952062 some_key_prefix_day_of_year_id 430.85 CreateTime:1543291968703 some_key_prefix_day_of_year_id 431.25 CreateTime:1543291987571 some_key_prefix_day_of_year_id 431.71 CreateTime:1543292022037 some_key_prefix_day_of_year_id 432.17 CreateTime:1543292029040 some_key_prefix_day_of_year_id 432.64 CreateTime:1543292032400 some_key_prefix_day_of_year_id 433.12 // In my time zone: Tuesday, November 27, 2018 6:13:52.400 AM GMT+02:00 CreateTime:1543292072814 some_key_prefix_day_of_year_id 0.47 // In my time zone: Tuesday, November 27, 2018 6:14:32.814 AM GMT+02:00 CreateTime:1543292081922 some_key_prefix_day_of_year_id 0.83 CreateTime:1543292132950 some_key_prefix_day_of_year_id 2.61 CreateTime:1543292169435 some_key_prefix_day_of_year_id 2.97 CreateTime:1543292182448 some_key_prefix_day_of_year_id 3.43 CreateTime:1543292201472 some_key_prefix_day_of_year_id 3.88 CreateTime:1543292205065 some_key_prefix_day_of_year_id 4.36 CreateTime:1543292216853 some_key_prefix_day_of_year_id 4.82 CreateTime:1543292242955 some_key_prefix_day_of_year_id 5.4 Meaning, somehow we lost our aggregated data. I am looking for an explanation... things I have checked and found out: 1. No ERROR to be found on the Kafka cluster 2. No ERROR in the application side around this time (a couple of minutes before we got "too many open files" from rocksdb but on a different store) 3. It seems to happen ONLY for "high rate" keys 4. When it happens - it happens on several keys at the same time 5. Around the same time, we had a Spot-Instance replacement. 6. Check the details on the reduce function and saw that only in the first invocation it doesn't actually apply the reduce function. Maybe we have some misconfiguration that can cause it? Any idea will be welcomed, Thanks, Nitay -- Nitay Kufert Backend Developer [image: ironSource] <http://www.ironsrc.com/> email nita...@ironsrc.com mobile +972-54-5480021 fax +972-77-5448273 skype nitay.kufert.ssa 9 Ehad Ha'am st. Tel- Aviv ironsrc.com <http://www.ironsrc.com/> [image: linkedin] <https://www.linkedin.com/company/ironsource>[image: twitter] <https://twitter.com/ironsource>[image: facebook] <https://www.facebook.com/ironSource>[image: googleplus] <https://plus.google.com/+ironsrc> This email (including any attachments) is for the sole use of the intended recipient and may contain confidential information which may be protected by legal privilege. If you are not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any use, dissemination, distribution or copying of this communication and/or its content is strictly prohibited. If you are not the intended recipient, please immediately notify us by reply email or by telephone, delete this email and destroy any copies. Thank you.