Well what I'm thinking for 100% accuracy no data loss just to base the
count on processing time. So whatever arrives in that window is counted. If
I get some events of the "current" window late and they go into another
window it's ok.

My pipeline is like so....

browser(user)----->REST API------>log file------>Filebeat------>Kafka (18
partitions)----->flink----->destination

Filebeat inserts into Kafka it's kindof a big bucket of "logs" which I use
flink to filter the specific app and do the counts. The logs are round
robin into the topic/partitions. Where I FORSEE a delay is Filebeat can't
push fast enough into Kafka AND/OR the flink consumer has not read all
events for that window from all partitions.

On Thu, 25 Nov 2021 at 11:28, Schwalbe Matthias <matthias.schwa...@viseca.ch>
wrote:

> Hi John,
>
>
>
> … just a short hint:
>
> With datastream API you can
>
>    - hand-craft a trigger that decides when an how often emit
>    intermediate, punctual and late window results, and when to evict the
>    window and stop processing late events
>    - in order to process late event you also need to specify for how long
>    you will extend the window processing (or is that done in the trigger … I
>    don’t remember right know)
>    - overall window state grows, if you extend window processing to after
>    it is finished …
>
>
>
> Hope this helps 😊
>
>
>
> Thias
>
>
>
> *From:* Caizhi Weng <tsreape...@gmail.com>
> *Sent:* Donnerstag, 25. November 2021 02:56
> *To:* John Smith <java.dev....@gmail.com>
> *Cc:* user <user@flink.apache.org>
> *Subject:* Re: Windows and data loss.
>
>
>
> Hi!
>
>
>
> Are you using the datastream API or the table / SQL API? I don't know if
> datastream API has this functionality, but in table / SQL API we have the
> following configurations [1].
>
>    - table.exec.emit.late-fire.enabled: Emit window results for late
>    records;
>    - table.exec.emit.late-fire.delay: How often shall we emit results for
>    late records (for example, once per 10 minutes or for every record).
>
>
>
> [1]
> https://github.com/apache/flink/blob/601ef3b3bce040264daa3aedcb9d98ead8303485/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/utils/WindowEmitStrategy.scala#L214
>
>
>
> John Smith <java.dev....@gmail.com> 于2021年11月25日周四 上午12:45写道:
>
> Hi I understand that when using windows and having set the watermarks and
> lateness configs. That if an event comes late it is lost and we can
> output it to side output.
>
> But wondering is there a way to do it without the loss?
>
> I'm guessing an "all" window with a custom trigger that just fires X
> period and whatever is on that bucket is in that bucket?
>
> Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und
> beinhaltet unter Umständen vertrauliche Mitteilungen. Da die
> Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann,
> übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und
> Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir
> Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie
> eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung
> dieser Informationen ist streng verboten.
>
> This message is intended only for the named recipient and may contain
> confidential or privileged information. As the confidentiality of email
> communication cannot be guaranteed, we do not accept any responsibility for
> the confidentiality and the intactness of this message. If you have
> received it in error, please advise the sender by return e-mail and delete
> this message and any attachments. Any unauthorised use or dissemination of
> this information is strictly prohibited.
>

Reply via email to