Re: How to determine average utilization before backpressure kicks in?

Morgan Geldenhuys Tue, 25 Feb 2020 05:48:32 -0800

Hi Roman,

Thank you for the reply.

Yes, I am aware that backpressure can be the result of many factors andyes this is an oversimplification of something very complex, please barewith me. Lets assume that this has been taken into account and haslowered the threshold for when this status permanently comes intoeffect, i.e. HIGH.

Example: The system is running along perfectly fine under normalconditions, accessing external sources, and processing at an average of100,000 messages/sec. Lets assume the maximum capacity is around 130,000message/sec before back pressure starts propagating messages back up thestream. Therefore, utilization is at 0.76 (100K/130K). Great, but atpresent we dont know that 130,000 is the limit.

For this example or for any job, is there a way of finding this maximumcapacity (and hence the utilization) without pushing the system to itslimit based on the current throughput? Possibly by measuring (as yousay) the saturation of certain buffers (looking into this now, however,i am not too familiar with flink internals)? It doesn't have to beextremely precise. Any hints would be greatly appreciated.


Regards,
M.

On 25.02.20 13:34, Khachatryan Roman wrote:

Hi Morgan,

Regarding backpressure, it can be caused by a number of factors, e.g.writing to an external system or slow input partitions.

However, if you know that a particular resource is a bottleneck thenit makes sense to monitor its saturation.It can be done by using Flink metrics. Please see the documentationfor more details:

https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html

Regards,
Roman

On Tue, Feb 25, 2020 at 12:33 PM Morgan Geldenhuys<morgan.geldenh...@tu-berlin.de<mailto:morgan.geldenh...@tu-berlin.de>> wrote:


    Hello community,

    I am fairly new to Flink and have a question concerning
    utilization. I
    was hoping someone could help.

    Knowing that backpressure is essentially the point at which
    utilization
    has reached 100% for any particular streaming pipeline and means that
    the application cannot "keep up" with the messages coming into the
    system.

    I was wondering, assuming a fairly stable input throughput, is
    there a
    way of determining the average utilization as a percentage? Can we
    determine how much more capacity each operator has before
    backpressure
    kicks in from metrics alone, i.e. 60% of capacity for example?
    Knowing
    that the maximum throughput of the DSP application is dictated by the
    slowest part of the pipeline, we would need to identify the slowest
    operator and then average horizontally.

    The only method that I can see of determining the point at which the
    system cannot keep up any longer is by scaling the input throughput
    slowly until the backpressure HIGH alarm is shown and hence the
    number
    of messages/sec is known.

    Yes I know this is a gross oversimplification and there are many many
    factors that need to be taken into account when dealing with
    backpressure, but it would be nice to have a general indicator, a
    rough
    estimate is fine.

    Thank you in advance.

    Regards,
    M.

Re: How to determine average utilization before backpressure kicks in?

Reply via email to