Hey Steve,

This configuration sounds pretty good. The one thing I would consider is having 
more disks, for two reasons — Spark uses the disks for large shuffles and 
out-of-core operations, and often it’s better to run HDFS or your storage 
system on the same nodes. But whether this is valuable will depend on whether 
you plan to do that in your deployment. You should determine that and go from 
there.

The amount of cores and RAM are both good — actually with a lot more of these 
you would probably want to run multiple Spark workers per node, which is more 
work to configure. Your numbers are in line with other deployments.

There’s a provisioning overview with more details at 
https://spark.apache.org/docs/latest/hardware-provisioning.html but what you 
have sounds fine.

Matei

On Apr 2, 2014, at 2:58 PM, Stephen Watt <[email protected]> wrote:

> Hi Folks
> 
> I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop 
> Server design but there does not seem to be much Spark related collateral 
> around infrastructure guidelines (or at least I haven't been able to find 
> them). My current thinking for server design is something along these lines.
> 
> - 2 x 10Gbe NICs
> - 128 GB RAM
> - 6 x 1 TB Small Form Factor Disks (2 x RAID 1 Mirror for O/S and Runtimes, 4 
> x 1TB for Data Drives)
> - 1 Disk Controller
> - 2 x 2.6 GHz 6 core processors
> 
> If I stick with 1u servers then I lose disk capacity per rack but I get a lot 
> more memory and CPU capacity per rack. This increases my total cluster memory 
> footprint and it doesn't seem to make sense to have super dense storage 
> servers because I can't fit all that data on disk in memory anyways. So at 
> present, my thinking is to go with 1u servers instead of 2u Servers. Is 128GB 
> RAM per server normal? Do you guys use more or less than that?
> 
> Any feedback would be appreciated
> 
> Regards
> Steve Watt

Reply via email to