I'm trying to understand the block allocation mechanism Spark uses to generate batch jobs and a JobSet.
The JobGenerator.generateJobs tries to allocate received blocks to batch, effectively in ReceivedBlockTracker.allocateBlocksToBatch creates a streamIdToBlocks, where steam ID's (Int) mapped to Seq[ReceivedBlockInfo] using getReceivedBlockQueue. This is where it gets tricky for me. getReceivedBlockQueue of class ReceivedBlockTracker reads streamIdToUnallocatedBlockQueues that should be populated with ReceivedBlockQueues? Who inserts these ReceivedBlockQueues into streamIdToUnallocatedBlockQueues and where does it get written? I've found only usages of 'effectively' value read. At a point streamIdToBlocks get packed into a case class of AllocatedBlocks. Why is it necessary? Also, at JobGenerator.generateJobs the line where receivedBlockInfos created, shouldn't it be empty, because streamIdToUnallocatedBlockQueues never got written to? Where do I miss the point? How does the JobGenerator.generateJobs able to retrieve the received block infos? Thanks, ZZ