Hi Vishnu,
A partition will either be in memory or in disk.
-Ashwin
On Feb 28, 2016 15:09, "Vishnu Viswanath"
wrote:
> Hi All,
>
> I have a question regarding Persistence (MEMORY_AND_DISK)
>
> Suppose I am trying to persist an RDD which has 2 partitions and only 1
> partition can be fit in memo
new files in each batch interval.
>>>> TD
>>>>
>>>>
>>>> On Tue, Jul 28, 2015 at 3:06 PM, Brandon White >>> > wrote:
>>>>
>>>>> val ssc = new StreamingContext(sc, Minutes(10))
>>>>>
>>>>> //500 textFile streams watching S3 directories
>>>>> val streams = streamPaths.par.map { path =>
>>>>> ssc.textFileStream(path)
>>>>> }
>>>>>
>>>>> streams.par.foreach { stream =>
>>>>> stream.foreachRDD { rdd =>
>>>>> //do something
>>>>> }
>>>>> }
>>>>>
>>>>> ssc.start()
>>>>>
>>>>> Would something like this scale? What would be the limiting factor to
>>>>> performance? What is the best way to parallelize this? Any other ideas on
>>>>> design?
>>>>>
>>>>
>>>>
>>>
>>
>
--
Thanks & Regards,
Ashwin Giridharan
er.
>
> Will ~10 streams get assigned to ~10 executors / nodes then the other ~20
> streams will be queued for resources or will the other streams just fail
> and never run?
>
--
Thanks & Regards,
Ashwin Giridharan
unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Thanks & Regards,
Ashwin Giridharan
D { rdd =>
>> //do something
>> }
>> }
>>
>> ssc.start()
>>
>> Would something like this scale? What would be the limiting factor to
>> performance? What is the best way to parallelize this? Any other ideas on
>> design?
>>
>
>
--
Thanks & Regards,
Ashwin Giridharan
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Thanks & Regards,
Ashwin Giridharan