jiwen624 commented on PR #50055:
URL: https://github.com/apache/spark/pull/50055#issuecomment-2683915590

   @cloud-fan Thanks a lot for the response and the context in 
https://github.com/apache/spark/pull/47721. It's very helpful.
   From what I see in previous PRs and comments in the code, it seems that we 
treat `initValue` in SQLMetric as a invalid value which should be filtered out. 
For example, in this recent PR https://github.com/apache/spark/pull/44222, it 
mentions:
   > initValue is the starting value for a SQLMetric. If a metric has value 
equal to its initValue, then it can/should be filtered out before aggregating 
with SQLMetrics.stringValue().
   
   and here the comment about the idea of having -1 indicates that it's 
originally introduced to "distinguish whether a SQLMetric is initialized or 
not" : https://github.com/apache/spark/pull/26899#discussion_r358044877
   
   and the comment in the code 
[here](https://github.com/apache/spark/blob/a452e1bc2189d4dda50df5c36a49e4d23e6db758/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L68-L69):
   
   > Metrics with value equal to initValue should be filtered out, since they 
are either invalid or safe to filter without changing...
   
   However, in 
[`MetricUtils.stringValue`](https://github.com/apache/spark/blob/a452e1bc2189d4dda50df5c36a49e4d23e6db758/core/src/main/scala/org/apache/spark/util/MetricUtils.scala#L88),
 it keeps `0` as a valid value in the filter logic, even it's possible that the 
`initValue` for these metrics can be `0`. This seems to contradict the 
semantics of initValue and makes the invalid/uninitialized values pollute the 
`min, med...` calculation - for example, the `med` may point to an 
unintialized/initValue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to