I notice that spark serialize each task with the dependencies (files and JARs
added to the SparkContext) ,
def serializeWithDependencies(
task: Task[_],
currentFiles: HashMap[String, Long],
currentJars: HashMap[String, Long],
serializer: SerializerInstance)
: ByteBuffer = {
val out = new ByteArrayOutputStream(4096)
val dataOut = new DataOutputStream(out)
// Write currentFiles
dataOut.writeInt(currentFiles.size)
for ((name, timestamp) <- currentFiles) {
dataOut.writeUTF(name)
dataOut.writeLong(timestamp)
}
// Write currentJars
dataOut.writeInt(currentJars.size)
for ((name, timestamp) <- currentJars) {
dataOut.writeUTF(name)
dataOut.writeLong(timestamp)
}
// Write the task itself and finish
dataOut.flush()
val taskBytes = serializer.serialize(task).array()
out.write(taskBytes)
ByteBuffer.wrap(out.toByteArray)
}
Why not send currentJars and currentFiles to exetutor using actor? I think
it's not necessary to serialize them for each task.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/send-currentJars-and-currentFiles-to-exetutor-with-actor-tp9381.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]