Hello fellow Sparkians,

Is there's some preferred way to have

*some given set-up task run on all workers?*The task at hand isn't a
computational task then, but rather some initial setup I want to run it for
its "*side-effects*". This could be to set-up some custom logging settings,
or metrics.

[Specifically, in our case we want to use coda's Metrics 2 on some jobs
we're running. Spark's workers internally use Metrics 3 (why do we need
metrics 2? good Q - dependency issue, not very interesting/important).
Anyhow, I'd like to setup some metrics 2 reporter on each of the nodes,
alongside the existing metrics 3 one.]

Was thinking of doing this via some "dummy" RDD on the SparkContext the job
will be running on, have this RDD partitioned enough so as to use all
workers, and execute some forEachPartition() on this RDD before doing the
real work. This seems a bit "hacky" to me - any clever ideas on how to do
this differently?

Much obliged,
*Noam Barcay*
Developer // *Kenshoo*
*Office* +972 3 746-6500 *427 // *Mobile* +972 54 475-3142
__________________________________________
*www.Kenshoo.com* <http://kenshoo.com/>

-- 
This e-mail, as well as any attached document, may contain material which 
is confidential and privileged and may include trademark, copyright and 
other intellectual property rights that are proprietary to Kenshoo Ltd, 
 its subsidiaries or affiliates ("Kenshoo"). This e-mail and its 
attachments may be read, copied and used only by the addressee for the 
purpose(s) for which it was disclosed herein. If you have received it in 
error, please destroy the message and any attachment, and contact us 
immediately. If you are not the intended recipient, be aware that any 
review, reliance, disclosure, copying, distribution or use of the contents 
of this message without Kenshoo's express permission is strictly prohibited.

Reply via email to