One of our Apache beam job running through the FlinkRunner is experiencing an
odd behaviour with checkpoint size. The state backend is File based. The job
receives traffic once a day for a period of an hour and then is idle until it
receives more data.
The checkpoint slowly increments in size as we process more data. However, the
size of the checkpoint does not decrease significantly once data has stopped
being consumed for that day.
We thought it could potentially be a bottle neck with the Database sink however
the same behaviour is present if we remove the sink and simply dump the data.
The behaviour seems to resemble a stepped graph e.g.
* checkpoint = 120KB (starting size checkpoint)
* checkpoint = 409MB (starts receiving data)
* checkpoint = 850MB (processing the backlog data)
* checkpoint = 503MB (finished processing data)
* checkpoint = 1.2GB (begins processing new data and backlog)
* checkpoint = 700MB (finished processing data)
* checkpoint = 700MB (new starting size for checkpoint)
* ...
Has anyone see this behaviour before? is this a known issue with Flink
checkpointing using Apache beam?
Thanks,
Steve
Stephen Hesketh | Client Analytics Technology
The information classification of this email is Confidential unless otherwise
stated.
This communication and any attachments are confidential and intended solely for
the addressee. If you are not the intended recipient please advise us
immediately and delete it. Unless specifically stated in the message or
otherwise indicated, you may not duplicate, redistribute or forward this
message and any attachments are not intended for distribution to, or use by any
person or entity in any jurisdiction or country where such distribution or use
would be contrary to local law or regulation. NatWest Markets Plc or any
affiliated entity ("NatWest Markets") accepts no responsibility for any changes
made to this message after it was sent.
Unless otherwise specifically indicated, the contents of this communication and
its attachments are for information purposes only and should not be regarded as
an offer or solicitation to buy or sell a product or service, confirmation of
any transaction, a valuation, indicative price or an official statement.
Trading desks may have a position or interest that is inconsistent with any
views expressed in this message. In evaluating the information contained in
this message, you should know that it could have been previously provided to
other clients and/or internal NatWest Markets personnel, who could have already
acted on it.
NatWest Markets cannot provide absolute assurances that all electronic
communications (sent or received) are secure, error free, not corrupted,
incomplete or virus free and/or that they will not be lost, mis-delivered,
destroyed, delayed or intercepted/decrypted by others. Therefore NatWest
Markets disclaims all liability with regards to electronic communications (and
the contents therein) if they are corrupted, lost destroyed, delayed,
incomplete, mis-delivered, intercepted, decrypted or otherwise misappropriated
by others.
Any electronic communication that is conducted within or through NatWest
Markets systems will be subject to being archived, monitored and produced to
regulators and in litigation in accordance with NatWest Markets’ policy and
local laws, rules and regulations. Unless expressly prohibited by local law,
electronic communications may be archived in countries other than the country
in which you are located, and may be treated in accordance with the laws and
regulations of the country of each individual included in the entire chain.
Copyright NatWest Markets Plc. All rights reserved. See
https://www.nwm.com/disclaimer for further risk disclosure.