subject:"Process time series RDD after sortByKey"

Re: Process time series RDD after sortByKey

2015-03-17 Thread Shawn Zheng

gt; create my own RDD class before (not RDD instance J). But this is very >> valuable approach to me so I am desired to learn. >> >> >> >> Regards, >> >> >> >> Shuai >> >> >> >> *From:* Imran Rashid [mailto:iras...@cloudera.c

Re: Process time series RDD after sortByKey

2015-03-16 Thread Imran Rashid

t; > > > *From:* Imran Rashid [mailto:iras...@cloudera.com] > *Sent:* Monday, March 16, 2015 11:22 AM > *To:* Shawn Zheng; user@spark.apache.org > *Subject:* Re: Process time series RDD after sortByKey > > > > Hi Shuai, > > > > On Sat, Mar 14, 2015 at 11:02

RE: Process time series RDD after sortByKey

2015-03-16 Thread Shuai Zheng

valuable approach to me so I am desired to learn. Regards, Shuai From: Imran Rashid [mailto:iras...@cloudera.com] Sent: Monday, March 16, 2015 11:22 AM To: Shawn Zheng; user@spark.apache.org Subject: Re: Process time series RDD after sortByKey Hi Shuai, On Sat, Mar 14, 2015 at

Re: Process time series RDD after sortByKey

2015-03-16 Thread Imran Rashid

Hi Shuai, On Sat, Mar 14, 2015 at 11:02 AM, Shawn Zheng wrote: > Sorry I response late. > > Zhan Zhang's solution is very interesting and I look at into it, but it is > not what I want. Basically I want to run the job sequentially and also gain > parallelism. So if possible, if I have 1000 parti

Re: Process time series RDD after sortByKey

2015-03-11 Thread Imran Rashid

this is a very interesting use case. First of all, its worth pointing out that if you really need to process the data sequentially, fundamentally you are limiting the parallelism you can get. Eg., if you need to process the entire data set sequentially, then you can't get any parallelism. If you

Re: Process time series RDD after sortByKey

2015-03-09 Thread Zhan Zhang

Does the code flow similar to following work for you, which processes each partition of an RDD sequentially? while( iterPartition < RDD.partitions.length) { val res = sc.runJob(this, (it: Iterator[T]) => somFunc, iterPartition, allowLocal = true) Some other function after processing

Process time series RDD after sortByKey

2015-03-09 Thread Shuai Zheng

Hi All, I am processing some time series data. For one day, it might has 500GB, then for each hour, it is around 20GB data. I need to sort the data before I start process. Assume I can sort them successfully dayRDD.sortByKey but after that, I might have thousands of partitions (to m

Re: Process time series RDD after sortByKey

Re: Process time series RDD after sortByKey

RE: Process time series RDD after sortByKey

Re: Process time series RDD after sortByKey

Re: Process time series RDD after sortByKey

Re: Process time series RDD after sortByKey

Process time series RDD after sortByKey

7 matches

Site Navigation

Mail list logo

Footer information