Hi,
What is the purpose of the taskBinary for a ShuffleMapTask? What does it
contain and how is it useful? Is it the representation of all the RDD
operations that will be applied for the partition that task will be
processing? (in the case below the task will process stage 0, partition 0)
If it is not a representation of the RDD operations inside the stage, then
how does a task know the operations that it should apply on its partition?
Thanks,
*{ShuffleMapTask@9034} "ShuffleMapTask(0, 0)"*
*taskBinary* = {TorrentBroadcast@8204} "Broadcast(1)"
org$apache$spark$broadcast$TorrentBroadcast$$evidence$1 =
{ClassTag$$anon$1@8470} "Array[byte]"
org$apache$spark$broadcast$TorrentBroadcast$$broadcastId =
{BroadcastBlockId@8249} "broadcast_1"
numBlocks = 1
_value = null
org$apache$spark$broadcast$TorrentBroadcast$$compressionCodec = {Some@8468}
"Some(org.apache.spark.io.SnappyCompressionCodec@7ede98e1)"
blockSize = 4194304
bitmap$trans$0 = false
id = 1
org$apache$spark$broadcast$Broadcast$$_destroySite = {String@5327} ""
_isValid = true
org$apache$spark$Logging$$log_ = null
* partition* = {HadoopPartition@9049}
* locs* = {$colon$colon@9050} "::" size = 1
* preferredLocs* = {ArrayBuffer@9051} "ArrayBuffer" size = 1
* org$apache$spark$Logging$$log_* = null
* stageId* = 0
* partitionId* = 0
* taskMemoryManager* = null
* epoch* = -1
* metrics* = {None$@5261} "None"
* _executorDeserializeTime* = 0
* context* = null
* taskThread* = null
* _killed* = false
etc..
etc..