at 12:29 PM, Aaron Davidson
>>>>> wrote:
>>>>>
>>>>>> Spark will only use each core for one task at a time, so doing
>>>>>>
>>>>>> sc.textFile(, )
>>>>>>
>>>>>> where you set &qu
r saveAsTextFile.
>>>>>
>>>>>
>>>>> On Mon, Mar 31, 2014 at 8:49 AM, Nicholas Chammas <
>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>
>>>>>> Howdy-doody,
>>>>>>
>>>>>> I have a single, very large file sitting in S3 that I want to read in
>>>>>> with sc.textFile(). What are the best practices for reading in this file
>>>>>> as
>>>>>> quickly as possible? How do I parallelize the read as much as possible?
>>>>>>
>>>>>> Similarly, say I have a single, very large RDD sitting in memory that
>>>>>> I want to write out to S3 with RDD.saveAsTextFile(). What are the best
>>>>>> practices for writing this file out as quickly as possible?
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context: Best practices: Parallelized write to
>>>>>> / read from
>>>>>> S3<http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html>
>>>>>> Sent from the Apache Spark User List mailing list
>>>>>> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at
>>>>>> Nabble.com.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
014 at 8:49 AM, Nicholas Chammas <
>>>> nicholas.cham...@gmail.com> wrote:
>>>>
>>>>> Howdy-doody,
>>>>>
>>>>> I have a single, very large file sitting in S3 that I want to read in
>>>>> with sc.textFile(). W
ad in
>>>> with sc.textFile(). What are the best practices for reading in this file as
>>>> quickly as possible? How do I parallelize the read as much as possible?
>>>>
>>>> Similarly, say I have a single, very large RDD sitting in memory that I
>>>>
as much as possible?
>>>
>>> Similarly, say I have a single, very large RDD sitting in memory that I
>>> want to write out to S3 with RDD.saveAsTextFile(). What are the best
>>> practices for writing this file out as quickly as possible?
>>>
>>> Nick
>
gt;>
>> Similarly, say I have a single, very large RDD sitting in memory that I
>> want to write out to S3 with RDD.saveAsTextFile(). What are the best
>> practices for writing this file out as quickly as possible?
>>
>> Nick
>>
>>
>> --
t; View this message in context: Best practices: Parallelized write to /
> read from
> S3<http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html>
> Sent from the Apache Spark User List mailing list
> archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>
that I
want to write out to S3 with RDD.saveAsTextFile(). What are the best
practices for writing this file out as quickly as possible?
Nick
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-Parallelized-write-to-read-from-S3-tp3516.html
Sent