Hi community,

I have a job which counts event number every 2 minutes, with TumblingWindow in 
ProcessingTime. However, it occasionally produces extra DUPLICATED records. For 
instance, for timestamp 1531368480000 below, it emits a normal result 
(cnt=1641161), and then followed by a few more records with very small result 
(2, 3, etc).

Can anyone shed some light on the possible reason, or how to fix it?

Below are the sample output.
-----------------------------------------------------------
{"timestamp":1531368240000,"cnt":1537821,"userId":"user01"}
{"timestamp":1531368360000,"cnt":1521464,"userId":"user01"}
{"timestamp":1531368480000,"cnt":1641161,"userId":"user01"}
{"timestamp":1531368480000,"cnt":2,"userId":"user01"}
{"timestamp":1531368480000,"cnt":3,"userId":"user01"}
{"timestamp":1531368480000,"cnt":3,"userId":"user01"}

And here is the job SQL:
-----------------------------------------------------------
INSERT INTO sink
SELECT
                TUMBLE_START(rowtime, INTERVAL '2' MINUTE) AS `timestamp`,
                count(vehicleId) AS cnt,
                userId
FROM source
                GROUP BY TUMBLE(rowtime, INTERVAL '2' MINUTE),
                userId

Thanks,
Youjun Yuan

Reply via email to