Performance advantage by loading data from local node over S3.

Nisrina Luthfiyati Wed, 29 Apr 2015 10:21:41 -0700

Hi all,
I'm new to Spark so I'm sorry if the question is too vague. I'm currently
trying to deploy a Spark cluster using YARN on an amazon EMR cluster. For
the data storage I'm currently using S3 but would loading the data in HDFS
from local node gives considerable performance advantage over loading from
S3?
Would the reduced traffic latency in data load affect the runtime largely,
considering most of the computation is done in memory?


Thank you,
Nisrina.

Performance advantage by loading data from local node over S3.

Reply via email to