Hi, Correct me if I am wrong, but I have not heard of such a guideline may be because it is actually very dynamic and depends on a lot factors. Most important factor is kind of workload, some workloads can benefit a lot from large memory and some don't. So its not just input data size its also how you are processing it. On top of that these guidelines will be different for cloud services and dedicated clusters. So unless we are endorsing a cloud platform they are not going to be exactly reproducible. Theoretically a few vertical nodes will do better than a lot of smaller instances for a memory friendly workload.
I think it would be good to post experiences and then that can eventually become some sort of guidelines. Prashant Sharma On Thu, Apr 3, 2014 at 1:36 PM, Sonal Goyal <sonalgoy...@gmail.com> wrote: > Hi, > > My earlier email did not get any response, I am looking for some > guidelines for sizing a spark cluster. Please let me know if there are any > best practices or rules of thumb. Thanks a lot. > > Best Regards, > Sonal > Nube Technologies <http://www.nubetech.co> > > <http://in.linkedin.com/in/sonalgoyal> > > > > > On Fri, Mar 28, 2014 at 4:55 PM, Sonal Goyal <sonalgoy...@gmail.com>wrote: > >> Hi, >> >> I am looking for any guidelines for Spark Cluster Sizing - are there any >> best practices or links for estimating the cluster specifications based on >> input data size, transformations etc? >> >> Thanks in advance for helping out. >> >> Best Regards, >> Sonal >> Nube Technologies <http://www.nubetech.co> >> >> <http://in.linkedin.com/in/sonalgoyal> >> >> >> >