Without knowing the data size, computation & storage requirements ... :

   - Dual 6 or 8 core machines, 256 GB memory each, 12-15 TB per machine.
   Probably 5-10 machines.
   - Don't go for the most exotic machines, otoh don't go for cheapest ones
   either.
      - Find a sweet spot with your vendor i.e. if dual 6 cores are a lot
      cheaper than dual 10 cores then go with the less expensive ones.
Same with
      disks - may be 2TB is a lot cheaper than 3 TB.
   - Decide if these are going to be storage intensive or compute intensive
   (I assume the latter) and configure accordingly
   - Make sure you can add storage to the machines - ie have free storage
   bays.
      - Or other way is to add more machines and buy the smaller speced
      machines.
   - Unless one has very firm I/O and compute requirements, I have found
   that FLOPS, and things of that nature, do not make that much sense.
      - Think in terms of RAM, CPU and storage - that is what will become
      the initial limitations.
      - Once there are enough production jobs, you can then figure out the
      FLOPS et al
   - 10 G network is a better choice, so price-in a 24-48 port TOR switch.
      - More concerned with the bandwidth between the cluster nodes, for
      shuffles et al

Cheers
<k/>

On Sun, Mar 8, 2015 at 2:29 PM, Nasir Khan <nasirkhan.onl...@gmail.com>
wrote:

> HI, I am going to submit a proposal to my University to setup my Standalone
> Spark Cluster, what hardware should i include in my proposal?
>
> I will be Working on classification (Spark MLlib) of Data streams (Spark
> Streams)
>
> If some body can fill up this answers, that will be great! Thanks
>
> *Cores *= (example 64 nodes, 1024 cores, your figures) ____________?
>
> *Performance**= (example= ~5.12TFlops, ~2TFlops, your figures) ___________?
>
> *GPU*= YES/NO ___________?
>
> *Fat Node* = YES/NO ___________?
>
> *CPU Hrs/ Yr* = (example 2000, 8000, your figures) ___________?
>
> *RAM/CPU* = (example 256GB, your figures) ___________?
> *
> Storage Processing* = (example 200TB, your figures) ___________?
>
> *Storage Output* = (example 5TB, 4TB HHD/SSD, your figures) ___________?
>
> *Most processors today carryout 4 FLOPS per cycle,  thus a single-core 2.5
> GHz processor has a theoretical performance of 10 billion FLOPS = 10GFLOPS
>
> Note:I Need a *general purpose* cluster, not very high end nor very low
> specs. It will not be dedicated to just one project i guess. You people
> already have experience in setting up clusters, that's the reason i posted
> it here :)
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/General-Purpose-Spark-Cluster-Hardware-Requirements-tp21963.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to