Hi,

What is the purpose of the taskBinary for a ShuffleMapTask? What does it
contain and how is it useful? Is it the representation of all the RDD
operations that will be applied for the partition that task will be
processing? (in the case below the task will process stage 0, partition 0)
If it is not a representation of the RDD operations inside the stage, then
how does a task know the operations that it should apply on its partition?

Thanks,

*{ShuffleMapTask@9034} "ShuffleMapTask(0, 0)"*
 *taskBinary* = {TorrentBroadcast@8204} "Broadcast(1)"
  org$apache$spark$broadcast$TorrentBroadcast$$evidence$1 =
{ClassTag$$anon$1@8470} "Array[byte]"
  org$apache$spark$broadcast$TorrentBroadcast$$broadcastId =
{BroadcastBlockId@8249} "broadcast_1"
  numBlocks = 1
  _value = null
  org$apache$spark$broadcast$TorrentBroadcast$$compressionCodec = {Some@8468}
"Some(org.apache.spark.io.SnappyCompressionCodec@7ede98e1)"
  blockSize = 4194304
  bitmap$trans$0 = false
  id = 1
  org$apache$spark$broadcast$Broadcast$$_destroySite = {String@5327} ""
  _isValid = true
  org$apache$spark$Logging$$log_ = null
* partition* = {HadoopPartition@9049}
* locs* = {$colon$colon@9050} "::" size = 1
* preferredLocs* = {ArrayBuffer@9051} "ArrayBuffer" size = 1
* org$apache$spark$Logging$$log_* = null
* stageId* = 0
* partitionId* = 0
* taskMemoryManager* = null
* epoch* = -1
* metrics* = {None$@5261} "None"
* _executorDeserializeTime* = 0
* context* = null
* taskThread* = null
* _killed* = false

etc..
etc..

Reply via email to