guangyu-yang-rokt commented on code in PR #38876: URL: https://github.com/apache/spark/pull/38876#discussion_r2103619826
########## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ########## @@ -637,9 +637,11 @@ private[spark] class BlockManager( def reregister(): Unit = { // TODO: We might need to rate limit re-registering. logInfo(s"BlockManager $blockManagerId re-registering with master") - master.registerBlockManager(blockManagerId, diskBlockManager.localDirsString, maxOnHeapMemory, - maxOffHeapMemory, storageEndpoint) - reportAllBlocks() + val id = master.registerBlockManager(blockManagerId, diskBlockManager.localDirsString, + maxOnHeapMemory, maxOffHeapMemory, storageEndpoint, isReRegister = true) + if (id.executorId != BlockManagerId.INVALID_EXECUTOR_ID) { + reportAllBlocks() + } Review Comment: > Added the termination logic here. cc @mridulm Hi @Ngone51, for our spark structured streaming jobs, we have seen a lot of executor getting killed by this termination logic with error code -1. which will eventually kill the driver due to the `spark.max.executor.failures` configurations. We've been seeing this very frequent for a streaming job that perform aggressive scaling as they mark a lot of executors idle at the same time. Just want to ask why are we using -1 error code here if the exeuctor is already killing itself? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org