Hi, What is the purpose of the taskBinary for a ShuffleMapTask? What does it contain and how is it useful? Is it the representation of all the RDD operations that will be applied for the partition that task will be processing? (in the case below the task will process stage 0, partition 0) If it is not a representation of the RDD operations inside the stage, then how does a task know the operations that it should apply on its partition?
Thanks, *{ShuffleMapTask@9034} "ShuffleMapTask(0, 0)"* *taskBinary* = {TorrentBroadcast@8204} "Broadcast(1)" org$apache$spark$broadcast$TorrentBroadcast$$evidence$1 = {ClassTag$$anon$1@8470} "Array[byte]" org$apache$spark$broadcast$TorrentBroadcast$$broadcastId = {BroadcastBlockId@8249} "broadcast_1" numBlocks = 1 _value = null org$apache$spark$broadcast$TorrentBroadcast$$compressionCodec = {Some@8468} "Some(org.apache.spark.io.SnappyCompressionCodec@7ede98e1)" blockSize = 4194304 bitmap$trans$0 = false id = 1 org$apache$spark$broadcast$Broadcast$$_destroySite = {String@5327} "" _isValid = true org$apache$spark$Logging$$log_ = null * partition* = {HadoopPartition@9049} * locs* = {$colon$colon@9050} "::" size = 1 * preferredLocs* = {ArrayBuffer@9051} "ArrayBuffer" size = 1 * org$apache$spark$Logging$$log_* = null * stageId* = 0 * partitionId* = 0 * taskMemoryManager* = null * epoch* = -1 * metrics* = {None$@5261} "None" * _executorDeserializeTime* = 0 * context* = null * taskThread* = null * _killed* = false etc.. etc..