Hey Steve, This configuration sounds pretty good. The one thing I would consider is having more disks, for two reasons — Spark uses the disks for large shuffles and out-of-core operations, and often it’s better to run HDFS or your storage system on the same nodes. But whether this is valuable will depend on whether you plan to do that in your deployment. You should determine that and go from there.
The amount of cores and RAM are both good — actually with a lot more of these you would probably want to run multiple Spark workers per node, which is more work to configure. Your numbers are in line with other deployments. There’s a provisioning overview with more details at https://spark.apache.org/docs/latest/hardware-provisioning.html but what you have sounds fine. Matei On Apr 2, 2014, at 2:58 PM, Stephen Watt <[email protected]> wrote: > Hi Folks > > I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop > Server design but there does not seem to be much Spark related collateral > around infrastructure guidelines (or at least I haven't been able to find > them). My current thinking for server design is something along these lines. > > - 2 x 10Gbe NICs > - 128 GB RAM > - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 4 > x 1TB for Data Drives) > - 1 Disk Controller > - 2 x 2.6 GHz 6 core processors > > If I stick with 1u servers then I lose disk capacity per rack but I get a lot > more memory and CPU capacity per rack. This increases my total cluster memory > footprint and it doesn't seem to make sense to have super dense storage > servers because I can't fit all that data on disk in memory anyways. So at > present, my thinking is to go with 1u servers instead of 2u Servers. Is 128GB > RAM per server normal? Do you guys use more or less than that? > > Any feedback would be appreciated > > Regards > Steve Watt
