[ https://issues.apache.org/jira/browse/SPARK-43221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun reassigned SPARK-43221: ------------------------------------- Assignee: Attila Zsolt Piros (was: Qiang Yang) > Executor obtained error information > ------------------------------------ > > Key: SPARK-43221 > URL: https://issues.apache.org/jira/browse/SPARK-43221 > Project: Spark > Issue Type: Bug > Components: Block Manager > Affects Versions: 3.1.1, 3.2.0, 3.3.0 > Reporter: Qiang Yang > Assignee: Attila Zsolt Piros > Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > Attachments: image-2023-04-21-00-19-58-021.png, > image-2023-04-21-00-24-22-059.png, image-2023-04-21-00-30-41-851.png, > image-2023-04-21-00-50-10-918.png, image-2023-04-21-00-53-20-720.png, > image-2023-04-21-00-54-11-968.png, image-2023-04-21-00-57-29-140.png > > Original Estimate: 24h > Remaining Estimate: 24h > > Spark on Yarn Cluster > When multiple executors exist on a node, and the same block exists on both > executors, with some in memory and some on disk. > Probabilistically, the executor failed to obtain the block,throw Exception: > java.lang.ArrayIndexOutofBoundsException: 0 > at > org.apache.spark.broadcast.TorrentBroadcast.$anonfun$readBlocks$1(TorrentBroadcast.scala:183) > > Next, I will replay the process of the problem occurring: > step 1: > The executor requests the driver to obtain block > information(locationsAndStatusOption). The input parameters are BlockId and > the host of its own node. Please note that it does not carry port information > line:1092 > !image-2023-04-21-00-24-22-059.png! > step 2: > On the driver side, the driver obtains all blockManagers holding the block > based on the BlockId. For non remote shuffle scenarios, the driver will > retrieve the first one with the blockId and blockManager from the locations > Assuming that there are two BlockManagers holding the BlockId on this node, > BM-1 holds the Block and stores it in memory, and BM-2 holds the Block and > stores it in disk > Assuming the returned status is of type memory and its disksize is 0 > line: 852, 856 > !image-2023-04-21-00-30-41-851.png! > step 3: > This method will return a BlockLocationsAndStatus object. If there are BMs > using disk, the disk's path information will be stored in localDirs > !image-2023-04-21-00-50-10-918.png! > step 4: > When the executor obtains locationsAndStatusOption, localDirs is not empty, > but status.diskSize is 0 > line: 1102 > !image-2023-04-21-00-54-11-968.png! > step 5: > The readDiskBlockFromSameHostExecutor only determines whether the Block file > exists, and then directly uses the incoming blocksize to read the byte array. > If the blocksize is 0, it returns an empty byte array > Only checked if the file exists > line: 1234, 1240 > !image-2023-04-21-00-57-29-140.png! > Taking values from an empty array, causing an out of bounds problem -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org