Hello there, We're processing IOT device data, each device sending several metrics.
When we upgrade our streams app, we set a brand new 'application.id' to reprocess a bit of past data, to warm up the state stores and aggregations to make sure all outputs will be valid. Downstream is designed for "at least once" so no problem with this bit of reprocessing. When this restart+reprocessing occurs, we observe a peak of "Skipping record for expired window" / "Skipping record for expired segment" warning in logs. My understanding so far is this: - a part of our topology is keyed by deviceId. - during the reprocessing, some tasks are moving faster for some partitions, which means there's a substantial difference between the various stream-times across tasks - at some point in the topology, we re-key the data by (deviceId, metric) for "group by metric" aggregations - this shuffles the data: deviceId1 was in partition 1 with eventTime1, deviceId2 was in partition 2 with eventTime2, and now by the magic of hashing a (device,metric) key, they are pushed together in the same partitionX. If eventTime2 is far ahead of eventTime1, then all windows will expire at once. Is this analysis correct ? Then, what's the proper way to avoid this ? Manually do a .repartition() with a custom partitioner after each .selectKey((device, metric)), and before going through aggregations ? Any other advice ? Thanks for your insights Mathieu