Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

Andrey Zagrebin Thu, 20 Aug 2020 07:10:56 -0700

Hi All,

Thanks for reviving the discussion, Matthias!


This would mean that we could adapt the current proposal to replace the
> Nonheap usage pane by a pane displaying the Metaspace usage.
>
I do not know the value of having the Nonheap usage in metrics. I can see
that the metaspace metric can be interesting for the users to debug OOMs.
We had the Nonheap usage before, so as discussed, I would be a bit careful
removing. I believe it deserves a separate poll in user ML
whether the Nonheap usage is useless or not.
As a current solution, we could keep both or merge them into one box with a
slash, like Metaspace/Nonheap -> 5Mb/10Mb, if the majority agrees that this
is not confusing and clear that the metaspace is a part of Nonheap.

Btw, the "Nonheap" in the configuration box of the current FLIP-102 is
probably incorrect or confusing as it does not one-to-one correspond to the
Nonheap JVM metric.

The only issue I see is that JVM Overhead would still not be represented in
> the memory usage
> overview.

My understanding is that we do not need a usage metric for JVM Overhead as
it is a virtual unmanaged component which is more about configuring the max
total process memory.

Is there a reason for us to introduce a nested structure
> TaskManagerMetricsInfo in the response object? I would rather keep it
> consistent in a flat structure instead, i.e. having all the members of
> TaskManagerResourceInfo being members of TaskManagerMetricsInfo

I would suggest introducing a separate REST call for
TaskManagerResourceInfo.
Semantically, TaskManagerResourceInfo is more about the TM configuration
and it is not directly related to the usage metrics.
In future, I would avoid having calls with many responsibilities and maybe
consider splitting the 'TM details' call into metrics etc unless there is a
concern for having to do more calls instead of one from UI.

Alternatively, one could think of grouping the metrics collecting the
> different values (i.e. max, used, committed) per metric in a JSON object.
> But this would apply for all the other metrics of TaskManagerMetricsInfo
> as
> well.

I would personally prefer this for metrics but I am not pushing for this.

metrics.resource.managedMemory and metrics.resource.networkMemory have
> counterparts in metrics.networkMemory[Used|Total] and
> metrics.managedMemory[Used|Total]: Is this redundant data or do they have
> different semantics?

As I understand, they have different semantics. The later is about
configuration, the former is about current usage metrics.

Is metrics.resource.totalProcessMemory a basic sum over all provided
> values?

this is again about configuration, I do not think it makes sense to come up
with a usage metric for the totalProcessMemory component.

Best,
Andrey


On Thu, Aug 20, 2020 at 9:06 AM Matthias <matth...@ververica.com> wrote:

> Hi Jing,
> I recently joined Ververica and started looking into FLIP-102. I'm trying
> to
> figure out how we would implement the proposal on the backend side.
> I looked into the proposal for the REST API response and a few questions
> popped up:
> - Is there a reason for us to introduce a nested structure
> TaskManagerMetricsInfo in the response object? I would rather keep it
> consistent in a flat structure instead, i.e. having all the members of
> TaskManagerResourceInfo being members of TaskManagerMetricsInfo.
>   Alternatively, one could think of grouping the metrics collecting the
> different values (i.e. max, used, committed) per metric in a JSON object.
> But this would apply for all the other metrics of TaskManagerMetricsInfo as
> well.
> - metrics.resource.managedMemory and metrics.resource.networkMemory have
> counterparts in metrics.networkMemory[Used|Total] and
> metrics.managedMemory[Used|Total]: Is this redundant data or do they have
> different semantics?
> - Is metrics.resource.totalProcessMemory a basic sum over all provided
> values? I see the necessity to have this member if we decide to not provide
> the memory usage for all memory pools (e.g. providing Metaspace but leaving
> Code Cache and Compressed Class Space as Non-Heap pools out of the
> response). Otherwise, would it be worth it to remove this member from the
> response for simplicity reasons since we could sum up the memory on the
> frontend side?
>
> Best,
> Matthias
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>

Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

Reply via email to