yanghua commented on a change in pull request #6850: [FLINK-10252] Handle oversized metric messges URL: https://github.com/apache/flink/pull/6850#discussion_r226222036
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/metrics/dump/MetricDumpSerialization.java ########## @@ -124,50 +124,86 @@ public MetricSerializationResult serialize( Map<Counter, Tuple2<QueryScopeInfo, String>> counters, Map<Gauge<?>, Tuple2<QueryScopeInfo, String>> gauges, Map<Histogram, Tuple2<QueryScopeInfo, String>> histograms, - Map<Meter, Tuple2<QueryScopeInfo, String>> meters) { + Map<Meter, Tuple2<QueryScopeInfo, String>> meters, + long maximumFramesize, + MetricQueryService queryService) { buffer.clear(); + boolean unregisterRemainingMetrics = false; int numCounters = 0; for (Map.Entry<Counter, Tuple2<QueryScopeInfo, String>> entry : counters.entrySet()) { + if (unregisterRemainingMetrics) { + queryService.unregister(entry.getKey()); + continue; + } + try { serializeCounter(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey()); numCounters++; + if (buffer.length() > maximumFramesize) { + unregisterRemainingMetrics = true; + } } catch (Exception e) { LOG.debug("Failed to serialize counter.", e); + } } int numGauges = 0; for (Map.Entry<Gauge<?>, Tuple2<QueryScopeInfo, String>> entry : gauges.entrySet()) { + if (unregisterRemainingMetrics) { + queryService.unregister(entry.getKey()); + continue; + } + try { serializeGauge(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey()); numGauges++; + if (buffer.length() > maximumFramesize) { + unregisterRemainingMetrics = true; + } } catch (Exception e) { LOG.debug("Failed to serialize gauge.", e); } } - int numHistograms = 0; - for (Map.Entry<Histogram, Tuple2<QueryScopeInfo, String>> entry : histograms.entrySet()) { - try { - serializeHistogram(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey()); - numHistograms++; - } catch (Exception e) { - LOG.debug("Failed to serialize histogram.", e); - } - } - int numMeters = 0; for (Map.Entry<Meter, Tuple2<QueryScopeInfo, String>> entry : meters.entrySet()) { + if (unregisterRemainingMetrics) { + queryService.unregister(entry.getKey()); + continue; + } + try { serializeMeter(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey()); numMeters++; + if (buffer.length() > maximumFramesize) { + unregisterRemainingMetrics = true; + } } catch (Exception e) { LOG.debug("Failed to serialize meter.", e); } } + int numHistograms = 0; + for (Map.Entry<Histogram, Tuple2<QueryScopeInfo, String>> entry : histograms.entrySet()) { + if (unregisterRemainingMetrics) { + queryService.unregister(entry.getKey()); + continue; + } + + try { + serializeHistogram(buffer, entry.getValue().f0, entry.getValue().f1, entry.getKey()); + numHistograms++; + if (buffer.length() > maximumFramesize) { Review comment: OK, I think if we adopt the strategy of throwing exception, this is the same as we judge the total size directly in MQS. Or should we think about the cost of implementation and the probability of size overflow when the metrics dump? Because I am not very clear, whether the size overflow is a high frequency scene. If not, I suggest to judge the total size directly and then return an error message. If yes, then we need to consider returning some of the metrics. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services