jayantdb opened a new pull request, #52445:
URL: https://github.com/apache/spark/pull/52445

   ### What changes were proposed in this pull request?
   This PR fixes an issue where `avgOffsetsBehindLatest` metric of Kafka 
sources object from streaming progress metrics JSON were displayed in 
scientific notation (e.g., 2.8366294E8). The fix uses safe Decimal casting to 
ensure values are displayed in a more human-readable format.
   
   Before change:
   ```
   {
     "id" : "d21e9dc9-95be-4548-8b1c-d5a576691abf",
     "runId" : "5023fd98-6e3d-44b1-ba52-8499c24ab8a0",
     "name" : "KafkaMetricsTest",
     "timestamp" : "2025-09-23T06:00:00.000Z",
     "batchId" : 1,
     "batchDuration" : 100,
     "numInputRows" : 800000,
     "inputRowsPerSecond" : 78886.1,
     "processedRowsPerSecond" : 41622.0,
     "durationMs" : {
       "total" : 100
     },
     "stateOperators" : [ ],
     "sources" : [ {
       "description" : "kafkaSource",
       "startOffset" : 100,
       "endOffset" : 200,
       "latestOffset" : 300,
       "numInputRows" : 800000,
       "inputRowsPerSecond" : 78886.1,
       "processedRowsPerSecond" : 41622.0,
       "metrics" : {
         "avgOffsetsBehindLatest" : "2.8366294E8",
         "maxOffsetsBehindLatest" : "283662940",
         "minOffsetsBehindLatest" : "283662940"
       }
     } ],
     "sink" : {
       "description" : "sink",
       "numOutputRows" : -1
     }
   }
   ```
   After change:
   ```
   {
     "id" : "d21e9dc9-95be-4548-8b1c-d5a576691abf",
     "runId" : "5023fd98-6e3d-44b1-ba52-8499c24ab8a0",
     "name" : "KafkaMetricsTest",
     "timestamp" : "2025-09-23T06:00:00.000Z",
     "batchId" : 1,
     "batchDuration" : 100,
     "numInputRows" : 800000,
     "inputRowsPerSecond" : 78886.1,
     "processedRowsPerSecond" : 41622.0,
     "durationMs" : {
       "total" : 100
     },
     "stateOperators" : [ ],
     "sources" : [ {
       "description" : "kafkaSource",
       "startOffset" : 100,
       "endOffset" : 200,
       "latestOffset" : 300,
       "numInputRows" : 800000,
       "inputRowsPerSecond" : 78886.1,
       "processedRowsPerSecond" : 41622.0,
       "metrics" : {
         "avgOffsetsBehindLatest" : "283662940.0",
         "maxOffsetsBehindLatest" : "283662940",
         "minOffsetsBehindLatest" : "283662940"
       }
     } ],
     "sink" : {
       "description" : "sink",
       "numOutputRows" : -1
     }
   }
   ```
   
   ### Why are the changes needed?
   Current formatting is not user-friendly. A user can easily interpret 
`2.8366294E8` as `2.8` instead of `283,662,940`, as E can be missed to be 
spotted. This fix will improve the readability of Spark Structured Streaming 
progress metrics JSON.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Run this Maven test:
   ```
   ./build/mvn -pl sql/core,sql/api \
   -am test \
   
-DwildcardSuites=org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite
 \
   -DwildcardTestName="SPARK-53690"
   ```
   Results:
   ```
   Run completed in 8 seconds, 890 milliseconds.
   Total number of tests run: 13
   Suites: completed 2, aborted 0
   Tests: succeeded 13, failed 0, canceled 0, ignored 0, pending 0
   All tests passed.
   [INFO] 
------------------------------------------------------------------------
   [INFO] Reactor Summary for Spark Project Parent POM 4.1.0-SNAPSHOT:
   [INFO] 
   [INFO] Spark Project Parent POM ........................... SUCCESS [  1.050 
s]
   [INFO] Spark Project Tags ................................. SUCCESS [  1.505 
s]
   [INFO] Spark Project Sketch ............................... SUCCESS [  1.379 
s]
   [INFO] Spark Project Common Java Utils .................... SUCCESS [  1.856 
s]
   [INFO] Spark Project Common Utils ......................... SUCCESS [  1.758 
s]
   [INFO] Spark Project Local DB ............................. SUCCESS [  4.224 
s]
   [INFO] Spark Project Networking ........................... SUCCESS [ 52.646 
s]
   [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  5.691 
s]
   [INFO] Spark Project Variant .............................. SUCCESS [  0.797 
s]
   [INFO] Spark Project Unsafe ............................... SUCCESS [  4.623 
s]
   [INFO] Spark Project Connect Shims ........................ SUCCESS [  0.719 
s]
   [INFO] Spark Project Launcher ............................. SUCCESS [  2.952 
s]
   [INFO] Spark Project Core ................................. SUCCESS [ 27.562 
s]
   [INFO] Spark Project SQL API .............................. SUCCESS [  1.982 
s]
   [INFO] Spark Project Catalyst ............................. SUCCESS [  6.497 
s]
   [INFO] Spark Project SQL .................................. SUCCESS [ 42.008 
s]
   [INFO] 
------------------------------------------------------------------------
   [INFO] BUILD SUCCESS
   [INFO] 
------------------------------------------------------------------------
   [INFO] Total time:  02:37 min
   [INFO] Finished at: 2025-09-25T09:30:36+05:30
   [INFO] 
------------------------------------------------------------------------
   ```
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to