Hi Yuan,

this sounds indeed weird. The SQL API uses regular DataStream API windows underneath so this problem should have come up earlier if this is problem in the implementation. Does this behavior reproducible on your local machine?

One thing that comes to my mind is that the "userId"s might not be 100% identical (same hashCode/equals method) because otherwise they would be properly grouped.

Regards,
Timo

Am 12.07.18 um 09:35 schrieb Yuan,Youjun:

Hi community,

I have a job which counts event number every 2 minutes, with TumblingWindow in ProcessingTime. However, it occasionally produces extra DUPLICATED records. For instance, for timestamp 1531368480000 below, it emits a normal result (cnt=1641161), and then followed by a few more records with very small result (2, 3, etc).

Can anyone shed some light on the possible reason, or how to fix it?

Below are the sample output.

-----------------------------------------------------------

{"timestamp":1531368240000,"cnt":1537821,"userId":"user01"}

{"timestamp":1531368360000,"cnt":1521464,"userId":"user01"}

{"timestamp":*1531368480000*,"cnt":1641161,"userId":"user01"}

{"timestamp":*1531368480000*,"cnt":2,"userId":"user01"}

{"timestamp":*1531368480000*,"cnt":3,"userId":"user01"}

{"timestamp":*1531368480000*,"cnt":3,"userId":"user01"}

And here is the job SQL:

-----------------------------------------------------------

INSERT INTO sink

SELECT

                TUMBLE_START(rowtime, INTERVAL '2' MINUTE) AS `timestamp`,

                count(vehicleId) AS cnt,

                userId

FROM source

                GROUP BY TUMBLE(rowtime, INTERVAL '2' MINUTE),

                userId

Thanks,

Youjun Yuan


Reply via email to