jiwen624 commented on PR #50055: URL: https://github.com/apache/spark/pull/50055#issuecomment-2683915590
@cloud-fan Thanks a lot for the response and the context in https://github.com/apache/spark/pull/47721. It's very helpful. From what I see in previous PRs and comments in the code, it seems that we treat `initValue` in SQLMetric as a invalid value which should be filtered out. For example, in this recent PR https://github.com/apache/spark/pull/44222, it mentions: > initValue is the starting value for a SQLMetric. If a metric has value equal to its initValue, then it can/should be filtered out before aggregating with SQLMetrics.stringValue(). and here the comment about the idea of having -1 indicates that it's originally introduced to "distinguish whether a SQLMetric is initialized or not" : https://github.com/apache/spark/pull/26899#discussion_r358044877 and the comment in the code [here](https://github.com/apache/spark/blob/a452e1bc2189d4dda50df5c36a49e4d23e6db758/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L68-L69): > Metrics with value equal to initValue should be filtered out, since they are either invalid or safe to filter without changing... However, in [`MetricUtils.stringValue`](https://github.com/apache/spark/blob/a452e1bc2189d4dda50df5c36a49e4d23e6db758/core/src/main/scala/org/apache/spark/util/MetricUtils.scala#L88), it keeps `0` as a valid value in the filter logic, even it's possible that the `initValue` for these metrics can be `0`. This seems to contradict the semantics of initValue and makes the invalid/uninitialized values pollute the `min, med...` calculation - for example, the `med` may point to an unintialized/initValue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org