Re: Question about MEOMORY_AND_DISK persistence

2016-02-28 Thread Ashwin Giridharan
Hi Vishnu, A partition will either be in memory or in disk. -Ashwin On Feb 28, 2016 15:09, "Vishnu Viswanath" wrote: > Hi All, > > I have a question regarding Persistence (MEMORY_AND_DISK) > > Suppose I am trying to persist an RDD which has 2 partitions and only 1 > partition can be fit in memo

Re: Has anybody ever tried running Spark Streaming on 500 text streams?

2015-07-31 Thread Ashwin Giridharan
new files in each batch interval. >>>> TD >>>> >>>> >>>> On Tue, Jul 28, 2015 at 3:06 PM, Brandon White >>> > wrote: >>>> >>>>> val ssc = new StreamingContext(sc, Minutes(10)) >>>>> >>>>> //500 textFile streams watching S3 directories >>>>> val streams = streamPaths.par.map { path => >>>>> ssc.textFileStream(path) >>>>> } >>>>> >>>>> streams.par.foreach { stream => >>>>> stream.foreachRDD { rdd => >>>>> //do something >>>>> } >>>>> } >>>>> >>>>> ssc.start() >>>>> >>>>> Would something like this scale? What would be the limiting factor to >>>>> performance? What is the best way to parallelize this? Any other ideas on >>>>> design? >>>>> >>>> >>>> >>> >> > -- Thanks & Regards, Ashwin Giridharan

Re: What happens when you create more DStreams then nodes in the cluster?

2015-07-31 Thread Ashwin Giridharan
er. > > Will ~10 streams get assigned to ~10 executors / nodes then the other ~20 > streams will be queued for resources or will the other streams just fail > and never run? > -- Thanks & Regards, Ashwin Giridharan

Re: How to control Spark Executors from getting Lost when using YARN client mode?

2015-07-30 Thread Ashwin Giridharan
unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Thanks & Regards, Ashwin Giridharan

Re: Has anybody ever tried running Spark Streaming on 500 text streams?

2015-07-28 Thread Ashwin Giridharan
D { rdd => >> //do something >> } >> } >> >> ssc.start() >> >> Would something like this scale? What would be the limiting factor to >> performance? What is the best way to parallelize this? Any other ideas on >> design? >> > > -- Thanks & Regards, Ashwin Giridharan

Re: Long running streaming application - worker death

2015-07-26 Thread Ashwin Giridharan
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Thanks & Regards, Ashwin Giridharan