On 8 Mar 2016, at 16:34, Eddie Esquivel
mailto:eduardo.esqui...@gmail.com>> wrote:
Hello All,
In the Spark documentation under "Hardware Requirements" it very clearly states:
We recommend having 4-8 disks per node, configured without RAID (just as
separate mount points)
My question is why not
One issue is that RAID levels providing data replication are not necessary
since HDFS already replicates blocks on multiple nodes.
On Tue, Mar 8, 2016 at 8:45 AM, Alex Kozlov wrote:
> Parallel disk IO? But the effect should be less noticeable compared to
> Hadoop which reads/writes a lot. Much
Parallel disk IO? But the effect should be less noticeable compared to
Hadoop which reads/writes a lot. Much depends on how often Spark persists
on disk. Depends on the specifics of the RAID controller as well.
If you write to HDFS as opposed to local file system this may be a big
factor as wel