Re: Tame Flink UI?

Chesnay Schepler Wed, 23 Nov 2016 08:02:14 -0800

Hello,

So there are 2 separate issues here:


1. The response when requesting the list of available metrics is pretty
   big.
2. The request for the values of these metrics is also pretty big, and
   the response even larger.

For now I will modify the WebUI to only ask the value of selectedmetrics, which will typically be a rather small number. Thisshould solve issue #2. (Provided no one has the funny idea of selectingall metrics!)

To fix #1 we will have to go with a more compact representation i guess;this however will require a bit more work,+ since thebackend has to detect the valid ranges (i.e the subtasks for which we doin fact have the metric).

Note that #1 should not be such a big problem since the list of metricsis not update regularly as far as i know.If one doesn't interact with the metrics tab no request (or maybe one atstartup) should be sent.


Regards,
Chesnay

On 16.11.2016 20:38, Cliff Resnick wrote:

Ufuk,

The above occurs for me simply by selecting a running job from the joblist.


Chesnay,

The 413 error is because of the large request size. Given all therepeated parameter names maybe a more compact representation wouldwork? For example, instead of enumerating all metrics, maybe ask forthe range?

On Wed, Nov 16, 2016 at 2:05 PM, Chesnay Schepler <ches...@apache.org<mailto:ches...@apache.org>> wrote:


    Hello,

    The WebInterfaces first pulls a list of all available metrics for
    a specific taskmanager/job/task (which is reasonable since how
    else would you select them),
    and then requests the values for all metrics by supplying the name
    of every single metric it just received, which is where things get
    funky.

    In this case we have a task with parallelism of around 90 (the
    number before the metric name is the subtask index).
    Now let's only consider IO metrics (numRecordsIn etc.).
    We then have 90 * 6 (task IO metrics) + 90 * 4 (operator IO
    metrics) * X (# of operators in the task) metrics.
    In the best of case of a single operator this results in 900
    metrics being pulled at once,
    which is done every few seconds; i don't know the exact update
    interval.

    We can disable this temporarily in a few ways; the easiest one
    being to simply never return any metrics in the initial metrics
    look-up.
    See AbstractMetricsHandler#getAvailableMetricsList

    Regards,
    Chesnay

    On 16.11.2016 19:15, Ufuk Celebi wrote:

        Hey Cliff,

        yes this has been recently merged to the master branch. I
        think you are right that this is not feasible. I thought that
        the metrics are pulled in selectively when you select them via
        the metrics list. It seems to be not the case.

        If it is really the case that everything is always requested
        then we would have to revert this for the time being. Did you
        select any metrics manually?

        – Ufuk

        On 16 November 2016 at 18:40:52, Cliff Resnick
        (cre...@gmail.com <mailto:cre...@gmail.com>) wrote:

            We're on 1.2-SNAPSHOT, and some time over the past couple
            of weeks the UI
            seems to have become much more aggressive polling for
            metrics. I'm seeing
            hundreds of 413 errors as the UI continuously tries to GET
            with a URL over
            100k, pretty much overwhelming the SOCKS proxy connection.
              Below is an example of a javascript GET that seems to be
            emitted every few
            seconds. Is this intentional? <removed by Chesnay>

Re: Tame Flink UI?

Reply via email to