Using sc.textFile will also read the file from HDFS one by one line through
iterator, don't need to fit all into memory, even you have small size of
memory, it still can be worked.
2015-06-12 13:19 GMT+08:00 SLiZn Liu :
> Hmm, you have a good point. So should I load the file by `sc.textFile()`
>
Hmm, you have a good point. So should I load the file by `sc.textFile()`
and specify a high number of partitions, and the file is then split into
partitions in memory across the cluster?
On Thu, Jun 11, 2015 at 9:27 PM ayan guha wrote:
> Why do you need to use stream in this use case? 50g need n
Why do you need to use stream in this use case? 50g need not to be in
memory. Give it a try with high number of partitions.
On 11 Jun 2015 23:09, "SLiZn Liu" wrote:
> Hi Spark Users,
>
> I'm trying to load a literally big file (50GB when compressed as gzip
> file, stored in HDFS) by receiving a D
Hi Spark Users,
I'm trying to load a literally big file (50GB when compressed as gzip file,
stored in HDFS) by receiving a DStream using `ssc.textFileStream`, as this
file cannot be fitted in my memory. However, it looks like no RDD will be
received until I copy this big file to a prior-specified