Howdy-doody, I have a single, very large file sitting in S3 that I want to read in with sc.textFile(). What are the best practices for reading in this file as quickly as possible? How do I parallelize the read as much as possible?
Similarly, say I have a single, very large RDD sitting in memory that I want to write out to S3 with RDD.saveAsTextFile(). What are the best practices for writing this file out as quickly as possible? Nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html Sent from the Apache Spark User List mailing list archive at Nabble.com.