Best practices: Parallelized write to / read from S3

Nicholas Chammas Mon, 31 Mar 2014 08:50:14 -0700

Howdy-doody,

I have a single, very large file sitting in S3 that I want to read in with
sc.textFile(). What are the best practices for reading in this file as
quickly as possible? How do I parallelize the read as much as possible?


Similarly, say I have a single, very large RDD sitting in memory that I
want to write out to S3 with RDD.saveAsTextFile(). What are the best
practices for writing this file out as quickly as possible?

Nick




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Best practices: Parallelized write to / read from S3

Reply via email to