Without knowing the data size, computation & storage requirements ... :
- Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine. Probably 5-10 machines. - Don't go for the most exotic machines, otoh don't go for cheapest ones either. - Find a sweet spot with your vendor i.e. if dual 6 cores are a lot cheaper than dual 10 cores then go with the less expensive ones. Same with disks - may be 2TB is a lot cheaper than 3 TB. - Decide if these are going to be storage intensive or compute intensive (I assume the latter) and configure accordingly - Make sure you can add storage to the machines - ie have free storage bays. - Or other way is to add more machines and buy the smaller speced machines. - Unless one has very firm I/O and compute requirements, I have found that FLOPS, and things of that nature, do not make that much sense. - Think in terms of RAM, CPU and storage - that is what will become the initial limitations. - Once there are enough production jobs, you can then figure out the FLOPS et al - 10 G network is a better choice, so price-in a 24-48 port TOR switch. - More concerned with the bandwidth between the cluster nodes, for shuffles et al Cheers <k/> On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan <nasirkhan.onl...@gmail.com> wrote: > HI, I am going to submit a proposal to my University to setup my Standalone > Spark Cluster, what hardware should i include in my proposal? > > I will be Working on classification (Spark MLlib) of Data streams (Spark > Streams) > > If some body can fill up this answers, that will be great! Thanks > > *Cores *= (example 64 nodes, 1024 cores, your figures) ____________? > > *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___________? > > *GPU*= YES/NO ___________? > > *Fat Node* = YES/NO ___________? > > *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___________? > > *RAM/CPU* = (example 256GB, your figures) ___________? > * > Storage Processing* = (example 200TB, your figures) ___________? > > *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___________? > > *Most processors today carryout 4 FLOPS per cycle, thus a single-core 2.5 > GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS > > Note:I Need a *general purpose* cluster, not very high end nor very low > specs. It will not be dedicated to just one project i guess. You people > already have experience in setting up clusters, that's the reason i posted > it here :) > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >