But if we use event time, if a failure happens potentially those events
can't be delivered in their windo they will be dropped if they come after
the lateness and watermark settings no?


On Fri, 26 Nov 2021 at 02:35, Schwalbe Matthias <matthias.schwa...@viseca.ch>
wrote:

> Hi John,
>
>
>
> Going with processing time is perfectly sound if the results meet your
> requirements and you can easily live with events misplaced into the wrong
> time window.
>
> This is also quite a bit cheaper resource-wise.
>
> However you might want to keep in mind situations when things break down
> (network interrupt, datacenter flooded etc. 😊). With processing time
> events count into the time window when processed, with event time they
> count into the time window when originally created a the source … even if
> processed much later …
>
>
>
> Thias
>
>
>
>
>
>
>
> *From:* John Smith <java.dev....@gmail.com>
> *Sent:* Freitag, 26. November 2021 02:55
> *To:* Schwalbe Matthias <matthias.schwa...@viseca.ch>
> *Cc:* Caizhi Weng <tsreape...@gmail.com>; user <user@flink.apache.org>
> *Subject:* Re: Windows and data loss.
>
>
>
> Well what I'm thinking for 100% accuracy no data loss just to base the
> count on processing time. So whatever arrives in that window is counted. If
> I get some events of the "current" window late and they go into another
> window it's ok.
>
> My pipeline is like so....
>
> browser(user)----->REST API------>log file------>Filebeat------>Kafka (18
> partitions)----->flink----->destination
>
> Filebeat inserts into Kafka it's kindof a big bucket of "logs" which I use
> flink to filter the specific app and do the counts. The logs are round
> robin into the topic/partitions. Where I FORSEE a delay is Filebeat can't
> push fast enough into Kafka AND/OR the flink consumer has not read all
> events for that window from all partitions.
>
>
>
> On Thu, 25 Nov 2021 at 11:28, Schwalbe Matthias <
> matthias.schwa...@viseca.ch> wrote:
>
> Hi John,
>
>
>
> … just a short hint:
>
> With datastream API you can
>
>    - hand-craft a trigger that decides when an how often emit
>    intermediate, punctual and late window results, and when to evict the
>    window and stop processing late events
>    - in order to process late event you also need to specify for how long
>    you will extend the window processing (or is that done in the trigger … I
>    don’t remember right know)
>    - overall window state grows, if you extend window processing to after
>    it is finished …
>
>
>
> Hope this helps 😊
>
>
>
> Thias
>
>
>
> *From:* Caizhi Weng <tsreape...@gmail.com>
> *Sent:* Donnerstag, 25. November 2021 02:56
> *To:* John Smith <java.dev....@gmail.com>
> *Cc:* user <user@flink.apache.org>
> *Subject:* Re: Windows and data loss.
>
>
>
> Hi!
>
>
>
> Are you using the datastream API or the table / SQL API? I don't know if
> datastream API has this functionality, but in table / SQL API we have the
> following configurations [1].
>
>    - table.exec.emit.late-fire.enabled: Emit window results for late
>    records;
>    - table.exec.emit.late-fire.delay: How often shall we emit results for
>    late records (for example, once per 10 minutes or for every record).
>
>
>
> [1]
> https://github.com/apache/flink/blob/601ef3b3bce040264daa3aedcb9d98ead8303485/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/WindowEmitStrategy.scala#L214
>
>
>
> John Smith <java.dev....@gmail.com> 于2021年11月25日周四 上午12:45写道:
>
> Hi I understand that when using windows and having set the watermarks and
> lateness configs. That if an event comes late it is lost and we can
> output it to side output.
>
> But wondering is there a way to do it without the loss?
>
> I'm guessing an "all" window with a custom trigger that just fires X
> period and whatever is on that bucket is in that bucket?
>
> Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und
> beinhaltet unter Umständen vertrauliche Mitteilungen. Da die
> Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann,
> übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und
> Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir
> Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie
> eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung
> dieser Informationen ist streng verboten.
>
> This message is intended only for the named recipient and may contain
> confidential or privileged information. As the confidentiality of email
> communication cannot be guaranteed, we do not accept any responsibility for
> the confidentiality and the intactness of this message. If you have
> received it in error, please advise the sender by return e-mail and delete
> this message and any attachments. Any unauthorised use or dissemination of
> this information is strictly prohibited.
>
> Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und
> beinhaltet unter Umständen vertrauliche Mitteilungen. Da die
> Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann,
> übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und
> Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir
> Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie
> eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung
> dieser Informationen ist streng verboten.
>
> This message is intended only for the named recipient and may contain
> confidential or privileged information. As the confidentiality of email
> communication cannot be guaranteed, we do not accept any responsibility for
> the confidentiality and the intactness of this message. If you have
> received it in error, please advise the sender by return e-mail and delete
> this message and any attachments. Any unauthorised use or dissemination of
> this information is strictly prohibited.
>

Reply via email to