Re: Spark Streaming - received block allocation to batch

Tathagata Das Wed, 11 Mar 2015 18:59:31 -0700

See responses inline.

On Wed, Mar 11, 2015 at 6:58 AM, Zoltán Zvara <zoltan.zv...@gmail.com>
wrote:


> I'm trying to understand the block allocation mechanism Spark uses to
> generate batch jobs and a JobSet.
>
> The JobGenerator.generateJobs tries to allocate received blocks to batch,
> effectively in ReceivedBlockTracker.allocateBlocksToBatch creates
> a streamIdToBlocks, where steam ID's (Int) mapped to Seq[ReceivedBlockInfo]
> using getReceivedBlockQueue. This is where it gets tricky for me.
>
> getReceivedBlockQueue of class ReceivedBlockTracker reads
> streamIdToUnallocatedBlockQueues
> that should be populated with ReceivedBlockQueues? Who inserts these
> ReceivedBlockQueues into streamIdToUnallocatedBlockQueues and where does it
> get written? I've found only usages of 'effectively' value read.
>
> Inserted here.
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceivedBlockTracker.scala#L84


> At a point streamIdToBlocks get packed into a case class
> of AllocatedBlocks. Why is it necessary?
>

Just a container that captures all the blocks allocated to a batch. Used
for both tracking in memory as well as writing it out to the write ahead
log.


>
> Also, at JobGenerator.generateJobs the line where receivedBlockInfos
> created,
> shouldn't it be empty, because streamIdToUnallocatedBlockQueues never got
> written to? Where do I miss the point? How does the
> JobGenerator.generateJobs
> able to retrieve the received block infos?
>
> I think the above line number answers that.


> Thanks,
>
> ZZ
>

Re: Spark Streaming - received block allocation to batch

Reply via email to