Re: Best practices: Parallelized write to / read from S3

2014-04-01 Thread Nicholas Chammas
at 12:29 PM, Aaron Davidson >>>>> wrote: >>>>> >>>>>> Spark will only use each core for one task at a time, so doing >>>>>> >>>>>> sc.textFile(, ) >>>>>> >>>>>> where you set &qu

Re: Best practices: Parallelized write to / read from S3

2014-04-01 Thread Aaron Davidson
r saveAsTextFile. >>>>> >>>>> >>>>> On Mon, Mar 31, 2014 at 8:49 AM, Nicholas Chammas < >>>>> nicholas.cham...@gmail.com> wrote: >>>>> >>>>>> Howdy-doody, >>>>>> >>>>>> I have a single, very large file sitting in S3 that I want to read in >>>>>> with sc.textFile(). What are the best practices for reading in this file >>>>>> as >>>>>> quickly as possible? How do I parallelize the read as much as possible? >>>>>> >>>>>> Similarly, say I have a single, very large RDD sitting in memory that >>>>>> I want to write out to S3 with RDD.saveAsTextFile(). What are the best >>>>>> practices for writing this file out as quickly as possible? >>>>>> >>>>>> Nick >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: Best practices: Parallelized write to >>>>>> / read from >>>>>> S3<http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html> >>>>>> Sent from the Apache Spark User List mailing list >>>>>> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at >>>>>> Nabble.com. >>>>>> >>>>> >>>>> >>>> >>> >> >

Re: Best practices: Parallelized write to / read from S3

2014-04-01 Thread Nicholas Chammas
014 at 8:49 AM, Nicholas Chammas < >>>> nicholas.cham...@gmail.com> wrote: >>>> >>>>> Howdy-doody, >>>>> >>>>> I have a single, very large file sitting in S3 that I want to read in >>>>> with sc.textFile(). W

Re: Best practices: Parallelized write to / read from S3

2014-03-31 Thread Nicholas Chammas
ad in >>>> with sc.textFile(). What are the best practices for reading in this file as >>>> quickly as possible? How do I parallelize the read as much as possible? >>>> >>>> Similarly, say I have a single, very large RDD sitting in memory that I >>>>

Re: Best practices: Parallelized write to / read from S3

2014-03-31 Thread Aaron Davidson
as much as possible? >>> >>> Similarly, say I have a single, very large RDD sitting in memory that I >>> want to write out to S3 with RDD.saveAsTextFile(). What are the best >>> practices for writing this file out as quickly as possible? >>> >>> Nick >

Re: Best practices: Parallelized write to / read from S3

2014-03-31 Thread Nicholas Chammas
gt;> >> Similarly, say I have a single, very large RDD sitting in memory that I >> want to write out to S3 with RDD.saveAsTextFile(). What are the best >> practices for writing this file out as quickly as possible? >> >> Nick >> >> >> --

Re: Best practices: Parallelized write to / read from S3

2014-03-31 Thread Aaron Davidson
t; View this message in context: Best practices: Parallelized write to / > read from > S3<http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html> > Sent from the Apache Spark User List mailing list > archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com. >

Best practices: Parallelized write to / read from S3

2014-03-31 Thread Nicholas Chammas
that I want to write out to S3 with RDD.saveAsTextFile(). What are the best practices for writing this file out as quickly as possible? Nick -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html Sent