Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

Matthias Pohl Thu, 20 Aug 2020 12:34:22 -0700

Good points, Andrey. Thanks for clarification. I made some minor
adaptations to the FLIP now:
- Renamed the `resource` member into `configuration` and made it a
top-level member besides `metrics` and `hardware` since it's not fitting
the volatile metrics context that well.
- I restructured the table under Proposed Changes to cover Metaspace now.
Additionally, I renamed `shuffle` into `network` to match the memory model
of FLIP-49.
- The UI in the screenshot picture has a bug: The counts of Direct and
Mapped are accompanied by a memory unit even though they are plain counts.


On Thu, Aug 20, 2020 at 4:10 PM Andrey Zagrebin <azagre...@apache.org>
wrote:

> Hi All,
>
> Thanks for reviving the discussion, Matthias!
>
> This would mean that we could adapt the current proposal to replace the
> > Nonheap usage pane by a pane displaying the Metaspace usage.
> >
> I do not know the value of having the Nonheap usage in metrics. I can see
> that the metaspace metric can be interesting for the users to debug OOMs.
> We had the Nonheap usage before, so as discussed, I would be a bit careful
> removing. I believe it deserves a separate poll in user ML
> whether the Nonheap usage is useless or not.
> As a current solution, we could keep both or merge them into one box with a
> slash, like Metaspace/Nonheap -> 5Mb/10Mb, if the majority agrees that this
> is not confusing and clear that the metaspace is a part of Nonheap.
>

That would be a good solution representing both metrics. I adapted the
table in FLIP-102's Confluence accordingly for now to have it visualized.
Let's see what others are thinking about it.


>
> Btw, the "Nonheap" in the configuration box of the current FLIP-102 is
> probably incorrect or confusing as it does not one-to-one correspond to the
> Nonheap JVM metric.
>
> The only issue I see is that JVM Overhead would still not be represented in
> > the memory usage
> > overview.
>
> My understanding is that we do not need a usage metric for JVM Overhead as
> it is a virtual unmanaged component which is more about configuring the max
> total process memory.
>
> Is there a reason for us to introduce a nested structure
> > TaskManagerMetricsInfo in the response object? I would rather keep it
> > consistent in a flat structure instead, i.e. having all the members of
> > TaskManagerResourceInfo being members of TaskManagerMetricsInfo
>
> I would suggest introducing a separate REST call for
> TaskManagerResourceInfo.
> Semantically, TaskManagerResourceInfo is more about the TM configuration
> and it is not directly related to the usage metrics.
> In future, I would avoid having calls with many responsibilities and maybe
> consider splitting the 'TM details' call into metrics etc unless there is a
> concern for having to do more calls instead of one from UI.
>

Good point. The growing size of the JSON response record might make it
worth splitting it up into different endpoints serving different groups of
data (e.g. /metrics for volatile values and /configuration for static ones).


>
> Alternatively, one could think of grouping the metrics collecting the
> > different values (i.e. max, used, committed) per metric in a JSON object.
> > But this would apply for all the other metrics of TaskManagerMetricsInfo
> > as
> > well.
>
> I would personally prefer this for metrics but I am not pushing for this.
>
> metrics.resource.managedMemory and metrics.resource.networkMemory have
> > counterparts in metrics.networkMemory[Used|Total] and
> > metrics.managedMemory[Used|Total]: Is this redundant data or do they have
> > different semantics?
>
> As I understand, they have different semantics. The later is about
> configuration, the former is about current usage metrics.
>

I see. Makes sense.

>
> Is metrics.resource.totalProcessMemory a basic sum over all provided
> > values?
>
> this is again about configuration, I do not think it makes sense to come up
> with a usage metric for the totalProcessMemory component.
>

Got it.


> Best,
> Andrey
>
>
> On Thu, Aug 20, 2020 at 9:06 AM Matthias <matth...@ververica.com> wrote:
>
> > Hi Jing,
> > I recently joined Ververica and started looking into FLIP-102. I'm trying
> > to
> > figure out how we would implement the proposal on the backend side.
> > I looked into the proposal for the REST API response and a few questions
> > popped up:
> > - Is there a reason for us to introduce a nested structure
> > TaskManagerMetricsInfo in the response object? I would rather keep it
> > consistent in a flat structure instead, i.e. having all the members of
> > TaskManagerResourceInfo being members of TaskManagerMetricsInfo.
> >   Alternatively, one could think of grouping the metrics collecting the
> > different values (i.e. max, used, committed) per metric in a JSON object.
> > But this would apply for all the other metrics of TaskManagerMetricsInfo
> as
> > well.
> > - metrics.resource.managedMemory and metrics.resource.networkMemory have
> > counterparts in metrics.networkMemory[Used|Total] and
> > metrics.managedMemory[Used|Total]: Is this redundant data or do they have
> > different semantics?
> > - Is metrics.resource.totalProcessMemory a basic sum over all provided
> > values? I see the necessity to have this member if we decide to not
> provide
> > the memory usage for all memory pools (e.g. providing Metaspace but
> leaving
> > Code Cache and Compressed Class Space as Non-Heap pools out of the
> > response). Otherwise, would it be worth it to remove this member from the
> > response for simplicity reasons since we could sum up the memory on the
> > frontend side?
> >
> > Best,
> > Matthias
> >
> >
> >
> > --
> > Sent from:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
> >
>


-- 

Matthias Pohl | Engineer

Follow us @VervericaData Ververica <https://www.ververica.com/>

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Yip Park Tung Jason, Jinwei (Kevin) Zhang, Karl Anton
Wehner

Re: [VOTE] FLIP-102: Add More Metrics to TaskManager

Reply via email to