Well what I'm thinking for 100% accuracy no data loss just to base the count on processing time. So whatever arrives in that window is counted. If I get some events of the "current" window late and they go into another window it's ok.
My pipeline is like so.... browser(user)----->REST API------>log file------>Filebeat------>Kafka (18 partitions)----->flink----->destination Filebeat inserts into Kafka it's kindof a big bucket of "logs" which I use flink to filter the specific app and do the counts. The logs are round robin into the topic/partitions. Where I FORSEE a delay is Filebeat can't push fast enough into Kafka AND/OR the flink consumer has not read all events for that window from all partitions. On Thu, 25 Nov 2021 at 11:28, Schwalbe Matthias <matthias.schwa...@viseca.ch> wrote: > Hi John, > > > > … just a short hint: > > With datastream API you can > > - hand-craft a trigger that decides when an how often emit > intermediate, punctual and late window results, and when to evict the > window and stop processing late events > - in order to process late event you also need to specify for how long > you will extend the window processing (or is that done in the trigger … I > don’t remember right know) > - overall window state grows, if you extend window processing to after > it is finished … > > > > Hope this helps 😊 > > > > Thias > > > > *From:* Caizhi Weng <tsreape...@gmail.com> > *Sent:* Donnerstag, 25. November 2021 02:56 > *To:* John Smith <java.dev....@gmail.com> > *Cc:* user <user@flink.apache.org> > *Subject:* Re: Windows and data loss. > > > > Hi! > > > > Are you using the datastream API or the table / SQL API? I don't know if > datastream API has this functionality, but in table / SQL API we have the > following configurations [1]. > > - table.exec.emit.late-fire.enabled: Emit window results for late > records; > - table.exec.emit.late-fire.delay: How often shall we emit results for > late records (for example, once per 10 minutes or for every record). > > > > [1] > https://github.com/apache/flink/blob/601ef3b3bce040264daa3aedcb9d98ead8303485/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/WindowEmitStrategy.scala#L214 > > > > John Smith <java.dev....@gmail.com> 于2021年11月25日周四 上午12:45写道: > > Hi I understand that when using windows and having set the watermarks and > lateness configs. That if an event comes late it is lost and we can > output it to side output. > > But wondering is there a way to do it without the loss? > > I'm guessing an "all" window with a custom trigger that just fires X > period and whatever is on that bucket is in that bucket? > > Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und > beinhaltet unter Umständen vertrauliche Mitteilungen. Da die > Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann, > übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und > Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir > Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie > eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung > dieser Informationen ist streng verboten. > > This message is intended only for the named recipient and may contain > confidential or privileged information. As the confidentiality of email > communication cannot be guaranteed, we do not accept any responsibility for > the confidentiality and the intactness of this message. If you have > received it in error, please advise the sender by return e-mail and delete > this message and any attachments. Any unauthorised use or dissemination of > this information is strictly prohibited. >