Re: [PR] [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost [spark]

via GitHub Thu, 22 May 2025 17:57:42 -0700


guangyu-yang-rokt commented on code in PR #38876:
URL: https://github.com/apache/spark/pull/38876#discussion_r2103619826



##########
core/src/main/scala/org/apache/spark/storage/BlockManager.scala:
##########
@@ -637,9 +637,11 @@ private[spark] class BlockManager(
   def reregister(): Unit = {
     // TODO: We might need to rate limit re-registering.
     logInfo(s"BlockManager $blockManagerId re-registering with master")
-    master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString, maxOnHeapMemory,
-      maxOffHeapMemory, storageEndpoint)
-    reportAllBlocks()
+    val id = master.registerBlockManager(blockManagerId, 
diskBlockManager.localDirsString,
+      maxOnHeapMemory, maxOffHeapMemory, storageEndpoint, isReRegister = true)
+    if (id.executorId != BlockManagerId.INVALID_EXECUTOR_ID) {
+      reportAllBlocks()
+    }

Review Comment:
   > Added the termination logic here. cc @mridulm
   
   Hi @Ngone51, for our spark structured streaming jobs, we have seen a lot of 
executor getting killed by this termination logic with error code -1. which 
will eventually kill the driver due to the `spark.max.executor.failures` 
configurations. We've been seeing this very frequent for a streaming job that 
perform aggressive scaling as they mark a lot of executors idle at the same 
time. Just want to ask why are we using -1 error code here if the exeuctor is 
already killing itself?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-41360][CORE] Avoid BlockManager re-registration if the executor has been lost [spark]

Reply via email to