subject:"Reading Really Big File Stream from HDFS"

Re: Reading Really Big File Stream from HDFS

2015-06-12 Thread Saisai Shao

Using sc.textFile will also read the file from HDFS one by one line through iterator, don't need to fit all into memory, even you have small size of memory, it still can be worked. 2015-06-12 13:19 GMT+08:00 SLiZn Liu : > Hmm, you have a good point. So should I load the file by `sc.textFile()` >

Re: Reading Really Big File Stream from HDFS

2015-06-11 Thread SLiZn Liu

Hmm, you have a good point. So should I load the file by `sc.textFile()` and specify a high number of partitions, and the file is then split into partitions in memory across the cluster? On Thu, Jun 11, 2015 at 9:27 PM ayan guha wrote: > Why do you need to use stream in this use case? 50g need n

Re: Reading Really Big File Stream from HDFS

2015-06-11 Thread ayan guha

Why do you need to use stream in this use case? 50g need not to be in memory. Give it a try with high number of partitions. On 11 Jun 2015 23:09, "SLiZn Liu" wrote: > Hi Spark Users, > > I'm trying to load a literally big file (50GB when compressed as gzip > file, stored in HDFS) by receiving a D

Reading Really Big File Stream from HDFS

2015-06-11 Thread SLiZn Liu

Hi Spark Users, I'm trying to load a literally big file (50GB when compressed as gzip file, stored in HDFS) by receiving a DStream using `ssc.textFileStream`, as this file cannot be fitted in my memory. However, it looks like no RDD will be received until I copy this big file to a prior-specified