Yes that help to understand better how works spark. But that was also what I
was afraid, I think the network communications will take to much time for my
job.

I will continue to look for a trick in order to not have network
communications.

I saw on the hadoop website that : "To minimize global bandwidth consumption
and read latency, HDFS tries to satisfy a read request from a replica that
is closest to the reader. If there exists a replica on the same rack as the
reader node, then that replica is preferred to satisfy the read request"

May if in a way I success to combine a part of spark and some of this, it
could work.

Thank you very much for you answer.

Germain.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-Spark-handle-RDD-via-HDFS-tp4003p4058.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to