Re: Spark on RAID

2016-03-09 Thread Steve Loughran
On 8 Mar 2016, at 16:34, Eddie Esquivel mailto:eduardo.esqui...@gmail.com>> wrote: Hello All, In the Spark documentation under "Hardware Requirements" it very clearly states: We recommend having 4-8 disks per node, configured without RAID (just as separate mount points) My question is why not

Re: Spark on RAID

2016-03-08 Thread Mark Hamstra
One issue is that RAID levels providing data replication are not necessary since HDFS already replicates blocks on multiple nodes. On Tue, Mar 8, 2016 at 8:45 AM, Alex Kozlov wrote: > Parallel disk IO? But the effect should be less noticeable compared to > Hadoop which reads/writes a lot. Much

Re: Spark on RAID

2016-03-08 Thread Alex Kozlov
Parallel disk IO? But the effect should be less noticeable compared to Hadoop which reads/writes a lot. Much depends on how often Spark persists on disk. Depends on the specifics of the RAID controller as well. If you write to HDFS as opposed to local file system this may be a big factor as wel