any word on this one? I would like to get this done as well. Although, my real use case is to do something on each executor right up in the beginning - and I was trying to hack it using broadcasts by broadcasting an object of my own and do whatever I want in the readObject method.
Any other way out? On Oct 4, 2014, at 7:36 PM, Peng Cheng <[email protected]> wrote: > While Spark already offers support for asynchronous reduce (collect data from > workers, while not interrupting execution of a parallel transformation) > through accumulator, I have made little progress on making this process > reciprocal, namely, to broadcast data from driver to workers to be used by > all executors in the middle of a transformation. This primarily intended to > be used in downpour SGD/adagrad, a non-blocking concurrent machine learning > optimizer that performs better than existing synchronous GD in MLlib, and > have vast application in training of many models. > > My attempt so far is to stick to out-of-the-box, immutable broadcast, open a > new thread on driver, in which I broadcast a thin data wrapper that when > deserialized, will insert into a mutable singleton that is already > replicated to all workers in the fat jar, this customized deserialization is > not hard, just overwrite readObject like this: > > class AutoInsert(var value: Int) extends Serializable{ > > WorkerReplica.last = value > > private def readObject(in: ObjectInputStream): Unit = { > in.defaultReadObject() > WorkerContainer.last = this.value > } > } > > Unfortunately it looks like the deserializtion is called lazily and won't do > anything before a worker use it (Broadcast[AutoInsert]), this is impossible > without waiting for workers' stage to be finished and broadcast again. I'm > wondering if I can 'hack' this thing into working? Or I'll have to write a > serious extension to broadcast component to enable changing the value. > > Hope I can find like-minded on this forum because ML is a selling point of > Spark. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Asynchronous-Broadcast-from-driver-to-workers-is-it-possible-tp15758.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
