Okay, so I created a simple stream (similar to the original stream), where
I just write the timestamps of each evaluated window to S3.
The session gap is 30 minutes, and this is one of the sessions:
(first-event, last-event, num-events)

11:23-11:23 11 events
11:25-11:26 51 events
11:28-11:29 74 events
11:31-11:31 13 events

Again, this is one session. How can we explain this? Why does Flink create
4 distinct windows within 8 minutes? I'm really lost here, I'd appreciate
some help.

On Tue, Jun 16, 2020 at 2:17 PM Ori Popowski <ori....@gmail.com> wrote:

> Hi, thanks for answering.
>
> > I guess you consume from Kafka from the earliest offset, so you consume
> historical data and Flink is catching-up.
> Yes, it's what's happening. But Kafka is partitioned on sessionId, so skew
> between partitions cannot explain it.
> I think the only way it can happen is when when suddenly there's one event
> with very late timestamp
>
> > Just to verify, if you do keyBy sessionId, do you check the gaps of
> events from the same session?
> Good point. sessionId is unique in this case, and even if it's not - every
> single session suffers from this problem of early triggering so it's very
> unlikely that all millions sessions within that hour had duplicates.
>
> I'm suspecting that the fact I have two ProcessWindowFunctions one after
> the other somehow causes this.
> I deployed a version with one window function which just prints the
> timestamps to S3 (to find out if I have event-time jumps) and suddenly it
> doesn't trigger early (I'm running for 10 minutes and not a single event
> has arrived to the sink)
>
> On Tue, Jun 16, 2020 at 12:01 PM Rafi Aroch <rafi.ar...@gmail.com> wrote:
>
>> Hi Ori,
>>
>> I guess you consume from Kafka from the earliest offset, so you consume
>> historical data and Flink is catching-up.
>>
>> Regarding: *My event-time timestamps also do not have big gaps*
>>
>> Just to verify, if you do keyBy sessionId, do you check the gaps of
>> events from the same session?
>>
>> Rafi
>>
>>
>> On Tue, Jun 16, 2020 at 9:36 AM Ori Popowski <ori....@gmail.com> wrote:
>>
>>> So why is it happening? I have no clue at the moment.
>>> My event-time timestamps also do not have big gaps between them that
>>> would explain the window triggering.
>>>
>>>
>>> On Mon, Jun 15, 2020 at 9:21 PM Robert Metzger <rmetz...@apache.org>
>>> wrote:
>>>
>>>> If you are using event time in Flink, it is disconnected from the real
>>>> world wall clock time.
>>>> You can process historical data in a streaming program as if it was
>>>> real-time data (potentially reading through (event time) years of data in a
>>>> few (wall clock) minutes)
>>>>
>>>> On Mon, Jun 15, 2020 at 4:58 PM Yichao Yang <1048262...@qq.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I think it maybe you use the event time, and the timestamp between
>>>>> your event data is bigger than 30minutes, maybe you can check the source
>>>>> data timestamp.
>>>>>
>>>>> Best,
>>>>> Yichao Yang
>>>>>
>>>>> ------------------------------
>>>>> 发自我的iPhone
>>>>>
>>>>>
>>>>> ------------------ Original ------------------
>>>>> *From:* Ori Popowski <ori....@gmail.com>
>>>>> *Date:* Mon,Jun 15,2020 10:50 PM
>>>>> *To:* user <user@flink.apache.org>
>>>>> *Subject:* Re: EventTimeSessionWindow firing too soon
>>>>>
>>>>>

Reply via email to