Re: Best way to determine # of workers

2016-03-25 Thread Aaron Jackson
I think the SparkListener is about as close as it gets. That way I can start up the instance (aws, open-stack, vmware, etc) and simply wait until the SparkListener indicates that the executors are online before starting. Thanks for the advise. Aaron On Fri, Mar 25, 2016 at 10:54 AM, Jacek Laskow

Re: Best way to determine # of workers

2016-03-25 Thread Jacek Laskowski
Hi, You may want to use SparkListener [1] (as webui) and listens to SparkListenerExecutorAdded and SparkListenerExecutorRemoved. [1] http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.SparkListener Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskow

Re: Best way to determine # of workers

2016-03-25 Thread Ted Yu
Here is the doc for defaultParallelism : /** Default level of parallelism to use when not given by user (e.g. parallelize and makeRDD). */ def defaultParallelism: Int = { What if the user changes parallelism ? Cheers On Fri, Mar 25, 2016 at 5:33 AM, manasdebashiskar wrote: > There is a sc

Re: Best way to determine # of workers

2016-03-25 Thread manasdebashiskar
There is a sc.sparkDefaultParallelism parameter that I use to dynamically maintain elasticity in my application. Depending upon your scenario this might be enough. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-determine-of-workers-tp26586p26594

Re: Best way to determine # of workers

2016-03-24 Thread Aaron Jackson
Well thats unfortunate, just means I have to scrape the webui for that information. As to why, I have a cluster that is being increased in size to accommodate the processing requirements of a large set of jobs. Its useful to know when the new workers have joined the spark cluster. In my specific

Re: Best way to determine # of workers

2016-03-24 Thread Takeshi Yamamuro
Hi, There is no way to get such information from your app. Why do you need that? thanks, maropu On Thu, Mar 24, 2016 at 8:23 AM, Ajaxx wrote: > I'm building some elasticity into my model and I'd like to know when my > workers have come online. It appears at present that the API only supports