Hello,
So there are 2 separate issues here:
1. The response when requesting the list of available metrics is pretty
big.
2. The request for the values of these metrics is also pretty big, and
the response even larger.
For now I will modify the WebUI to only ask the value of selected
metrics, which will typically be a rather small number. This
should solve issue #2. (Provided no one has the funny idea of selecting
all metrics!)
To fix #1 we will have to go with a more compact representation i guess;
this however will require a bit more work,+ since the
backend has to detect the valid ranges (i.e the subtasks for which we do
in fact have the metric).
Note that #1 should not be such a big problem since the list of metrics
is not update regularly as far as i know.
If one doesn't interact with the metrics tab no request (or maybe one at
startup) should be sent.
Regards,
Chesnay
On 16.11.2016 20:38, Cliff Resnick wrote:
Ufuk,
The above occurs for me simply by selecting a running job from the job
list.
Chesnay,
The 413 error is because of the large request size. Given all the
repeated parameter names maybe a more compact representation would
work? For example, instead of enumerating all metrics, maybe ask for
the range?
On Wed, Nov 16, 2016 at 2:05 PM, Chesnay Schepler <ches...@apache.org
<mailto:ches...@apache.org>> wrote:
Hello,
The WebInterfaces first pulls a list of all available metrics for
a specific taskmanager/job/task (which is reasonable since how
else would you select them),
and then requests the values for all metrics by supplying the name
of every single metric it just received, which is where things get
funky.
In this case we have a task with parallelism of around 90 (the
number before the metric name is the subtask index).
Now let's only consider IO metrics (numRecordsIn etc.).
We then have 90 * 6 (task IO metrics) + 90 * 4 (operator IO
metrics) * X (# of operators in the task) metrics.
In the best of case of a single operator this results in 900
metrics being pulled at once,
which is done every few seconds; i don't know the exact update
interval.
We can disable this temporarily in a few ways; the easiest one
being to simply never return any metrics in the initial metrics
look-up.
See AbstractMetricsHandler#getAvailableMetricsList
Regards,
Chesnay
On 16.11.2016 19:15, Ufuk Celebi wrote:
Hey Cliff,
yes this has been recently merged to the master branch. I
think you are right that this is not feasible. I thought that
the metrics are pulled in selectively when you select them via
the metrics list. It seems to be not the case.
If it is really the case that everything is always requested
then we would have to revert this for the time being. Did you
select any metrics manually?
– Ufuk
On 16 November 2016 at 18:40:52, Cliff Resnick
(cre...@gmail.com <mailto:cre...@gmail.com>) wrote:
We're on 1.2-SNAPSHOT, and some time over the past couple
of weeks the UI
seems to have become much more aggressive polling for
metrics. I'm seeing
hundreds of 413 errors as the UI continuously tries to GET
with a URL over
100k, pretty much overwhelming the SOCKS proxy connection.
Below is an example of a javascript GET that seems to be
emitted every few
seconds. Is this intentional? <removed by Chesnay>