The current design is not ideal, but the size of dependencies should be fairly small since we only send the path and timestamp, not the jars themselves.
Executors can come and go. This is essentially a state replication problem that you gotta be very careful with consistency. On Sun, Nov 16, 2014 at 4:24 AM, scwf <wangf...@huawei.com> wrote: > I notice that spark serialize each task with the dependencies (files and > JARs > added to the SparkContext) , > def serializeWithDependencies( > task: Task[_], > currentFiles: HashMap[String, Long], > currentJars: HashMap[String, Long], > serializer: SerializerInstance) > : ByteBuffer = { > > val out = new ByteArrayOutputStream(4096) > val dataOut = new DataOutputStream(out) > > // Write currentFiles > dataOut.writeInt(currentFiles.size) > for ((name, timestamp) <- currentFiles) { > dataOut.writeUTF(name) > dataOut.writeLong(timestamp) > } > > // Write currentJars > dataOut.writeInt(currentJars.size) > for ((name, timestamp) <- currentJars) { > dataOut.writeUTF(name) > dataOut.writeLong(timestamp) > } > > // Write the task itself and finish > dataOut.flush() > val taskBytes = serializer.serialize(task).array() > out.write(taskBytes) > ByteBuffer.wrap(out.toByteArray) > } > > Why not send currentJars and currentFiles to exetutor using actor? I think > it's not necessary to serialize them for each task. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/send-currentJars-and-currentFiles-to-exetutor-with-actor-tp9381.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >