Hello fellow Sparkians, Is there's some preferred way to have
*some given set-up task run on all workers?*The task at hand isn't a computational task then, but rather some initial setup I want to run it for its "*side-effects*". This could be to set-up some custom logging settings, or metrics. [Specifically, in our case we want to use coda's Metrics 2 on some jobs we're running. Spark's workers internally use Metrics 3 (why do we need metrics 2? good Q - dependency issue, not very interesting/important). Anyhow, I'd like to setup some metrics 2 reporter on each of the nodes, alongside the existing metrics 3 one.] Was thinking of doing this via some "dummy" RDD on the SparkContext the job will be running on, have this RDD partitioned enough so as to use all workers, and execute some forEachPartition() on this RDD before doing the real work. This seems a bit "hacky" to me - any clever ideas on how to do this differently? Much obliged, *Noam Barcay* Developer // *Kenshoo* *Office* +972 3 746-6500 *427 // *Mobile* +972 54 475-3142 __________________________________________ *www.Kenshoo.com* <http://kenshoo.com/> -- This e-mail, as well as any attached document, may contain material which is confidential and privileged and may include trademark, copyright and other intellectual property rights that are proprietary to Kenshoo Ltd, its subsidiaries or affiliates ("Kenshoo"). This e-mail and its attachments may be read, copied and used only by the addressee for the purpose(s) for which it was disclosed herein. If you have received it in error, please destroy the message and any attachment, and contact us immediately. If you are not the intended recipient, be aware that any review, reliance, disclosure, copying, distribution or use of the contents of this message without Kenshoo's express permission is strictly prohibited.