pnowojski commented on a change in pull request #18991:
URL: https://github.com/apache/flink/pull/18991#discussion_r820731984



##########
File path: docs/content.zh/docs/ops/metrics.md
##########
@@ -1428,6 +1428,16 @@ Note that the metrics are only available via reporters.
       <td>Maximum recorded duration of a single consecutive period of the task 
being in the hard back pressure state in the last sampling period. Please check 
softBackPressuredTimeMsPerSecond and hardBackPressuredTimeMsPerSecond for more 
information.</td>
       <td>Gauge</td>
     </tr>
+    <tr>

Review comment:
       what about the mailbox queue length? Should we add it as well?

##########
File path: docs/content.zh/docs/ops/metrics.md
##########
@@ -1428,6 +1428,16 @@ Note that the metrics are only available via reporters.
       <td>Maximum recorded duration of a single consecutive period of the task 
being in the hard back pressure state in the last sampling period. Please check 
softBackPressuredTimeMsPerSecond and hardBackPressuredTimeMsPerSecond for more 
information.</td>
       <td>Gauge</td>
     </tr>
+    <tr>
+      <td>mailboxThroughputPerSecond</td>
+      <td>The number of actions processed from the task's mailbox per second 
over all actions which includes, e.g., checkpointing, timer, or cancellation 
actions.</td>
+      <td>Meter</td>
+    </tr>
+    <tr>
+      <td>mailboxThroughputLatencyUs</td>
+      <td>The latency in processing actions from the task's mailbox, i.e., an 
action's waiting time in the mailbox, in microseconds for actions with standard 
(default) priority, e.g., checkpointing actions.</td>
+      <td>Histogram</td>

Review comment:
       Have you checked how is it being reported/displayed, especially in the 
WebUI?

##########
File path: docs/content.zh/docs/ops/metrics.md
##########
@@ -1428,6 +1428,16 @@ Note that the metrics are only available via reporters.
       <td>Maximum recorded duration of a single consecutive period of the task 
being in the hard back pressure state in the last sampling period. Please check 
softBackPressuredTimeMsPerSecond and hardBackPressuredTimeMsPerSecond for more 
information.</td>
       <td>Gauge</td>
     </tr>
+    <tr>
+      <td>mailboxThroughputPerSecond</td>

Review comment:
       `mailsPerSecond` would be more consistent with existing metrics? 
`taskThreadActionsPerSecond`?

##########
File path: docs/content.zh/docs/ops/metrics.md
##########
@@ -1428,6 +1428,16 @@ Note that the metrics are only available via reporters.
       <td>Maximum recorded duration of a single consecutive period of the task 
being in the hard back pressure state in the last sampling period. Please check 
softBackPressuredTimeMsPerSecond and hardBackPressuredTimeMsPerSecond for more 
information.</td>
       <td>Gauge</td>
     </tr>
+    <tr>
+      <td>mailboxThroughputPerSecond</td>
+      <td>The number of actions processed from the task's mailbox per second 
over all actions which includes, e.g., checkpointing, timer, or cancellation 
actions.</td>
+      <td>Meter</td>
+    </tr>
+    <tr>
+      <td>mailboxThroughputLatencyUs</td>

Review comment:
       `mailboxLatencyUs`? `taskThreadLatencyUs`?

##########
File path: docs/content.zh/docs/ops/metrics.md
##########
@@ -1428,6 +1428,16 @@ Note that the metrics are only available via reporters.
       <td>Maximum recorded duration of a single consecutive period of the task 
being in the hard back pressure state in the last sampling period. Please check 
softBackPressuredTimeMsPerSecond and hardBackPressuredTimeMsPerSecond for more 
information.</td>
       <td>Gauge</td>
     </tr>
+    <tr>
+      <td>mailboxThroughputPerSecond</td>
+      <td>The number of actions processed from the task's mailbox per second 
over all actions which includes, e.g., checkpointing, timer, or cancellation 
actions.</td>
+      <td>Meter</td>
+    </tr>
+    <tr>
+      <td>mailboxThroughputLatencyUs</td>

Review comment:
       Why microseconds? Everything else is either in seconds, milliseconds or 
nanoseconds. 

##########
File path: docs/content.zh/docs/ops/metrics.md
##########
@@ -1328,7 +1328,7 @@ Note that the metrics are only available via reporters.
       <td>Histogram</td>
     </tr>
     <tr>
-      <th rowspan="20"><strong>Task</strong></th>
+      <th rowspan="22"><strong>Task</strong></th>

Review comment:
       Do we have a better place for those metrics? They are not very "task io" 
related

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/metrics/groups/TaskIOMetricGroup.java
##########
@@ -100,6 +104,10 @@ public TaskIOMetricGroup(TaskMetricGroup parent) {
                         
hardBackPressuredTimePerSecond::getMaxSingleMeasurement);
 
         this.busyTimePerSecond = gauge(MetricNames.TASK_BUSY_TIME, 
this::getBusyTimePerSecond);
+
+        this.mailboxThroughput = meter(MetricNames.MAILBOX_THROUGHPUT, new 
MeterView(60));
+        this.mailboxLatency =
+                histogram(MetricNames.MAILBOX_LATENCY, new 
DescriptiveStatisticsHistogram(60));

Review comment:
       Why are you defining `60s`  as `timeSpanInSeconds` here? Why not using a 
default parameter?

##########
File path: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/tasks/mailbox/MailboxProcessor.java
##########
@@ -175,6 +199,7 @@ public void close() {
     public void drain() throws Exception {
         for (final Mail mail : mailbox.drain()) {
             mail.run();
+            mailboxThroughputMeter.markEvent();

Review comment:
       I would be slightly worried about adding extra virtual calls overhead. I 
don't think this is a confirmed concern here, but since we could quite easily 
avoid it, wouldn't it be better to have here a simple `long counter++` here?

##########
File path: docs/content.zh/docs/ops/metrics.md
##########
@@ -1428,6 +1428,16 @@ Note that the metrics are only available via reporters.
       <td>Maximum recorded duration of a single consecutive period of the task 
being in the hard back pressure state in the last sampling period. Please check 
softBackPressuredTimeMsPerSecond and hardBackPressuredTimeMsPerSecond for more 
information.</td>
       <td>Gauge</td>
     </tr>
+    <tr>
+      <td>mailboxThroughputPerSecond</td>
+      <td>The number of actions processed from the task's mailbox per second 
over all actions which includes, e.g., checkpointing, timer, or cancellation 
actions.</td>
+      <td>Meter</td>
+    </tr>
+    <tr>
+      <td>mailboxThroughputLatencyUs</td>
+      <td>The latency in processing actions from the task's mailbox, i.e., an 
action's waiting time in the mailbox, in microseconds for actions with standard 
(default) priority, e.g., checkpointing actions.</td>

Review comment:
       I would mention that it is being sampled periodically, so it might not 
be fully accurate. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to