Re: filling missing values in a sequence

2016-09-19 Thread Sudhindra Magadi
thanks ayan On Mon, Sep 19, 2016 at 12:25 PM, ayan guha wrote: > Let me give you a possible direction, please do not use as it is :) > > >>> r = sc.parallelize([1,3,4,6,8,11,12,5],3) > > here, I am loading some numbers and partitioning. This partitioning is > critical. You may just use partition

Re: filling missing values in a sequence

2016-09-18 Thread ayan guha
Let me give you a possible direction, please do not use as it is :) >>> r = sc.parallelize([1,3,4,6,8,11,12,5],3) here, I am loading some numbers and partitioning. This partitioning is critical. You may just use partitioning scheme comes with Spark (like above) or, use your own through partitionB

Re: filling missing values in a sequence

2016-09-18 Thread Sudhindra Magadi
that is correct On Mon, Sep 19, 2016 at 12:09 PM, ayan guha wrote: > Ok, so if you see > > 1,3,4,6. > > Will you say 2,5 are missing? > > On Mon, Sep 19, 2016 at 4:15 PM, Sudhindra Magadi > wrote: > >> Each of the records will be having a sequence id .No duplicates >> >> On Mon, Sep 19, 201

Re: filling missing values in a sequence

2016-09-18 Thread ayan guha
Ok, so if you see 1,3,4,6. Will you say 2,5 are missing? On Mon, Sep 19, 2016 at 4:15 PM, Sudhindra Magadi wrote: > Each of the records will be having a sequence id .No duplicates > > On Mon, Sep 19, 2016 at 11:42 AM, ayan guha wrote: > >> And how do you define missing sequence? Can you g

Re: filling missing values in a sequence

2016-09-18 Thread Sudhindra Magadi
Each of the records will be having a sequence id .No duplicates On Mon, Sep 19, 2016 at 11:42 AM, ayan guha wrote: > And how do you define missing sequence? Can you give an example? > > On Mon, Sep 19, 2016 at 3:48 PM, Sudhindra Magadi > wrote: > >> Hi Jorn , >> We have a file with billion rec

Re: filling missing values in a sequence

2016-09-18 Thread ayan guha
And how do you define missing sequence? Can you give an example? On Mon, Sep 19, 2016 at 3:48 PM, Sudhindra Magadi wrote: > Hi Jorn , > We have a file with billion records.We want to find if there any missing > sequences here .If so what are they ? > Thanks > Sudhindra > > On Mon, Sep 19, 2016

Re: filling missing values in a sequence

2016-09-18 Thread Sudhindra Magadi
Hi Jorn , We have a file with billion records.We want to find if there any missing sequences here .If so what are they ? Thanks Sudhindra On Mon, Sep 19, 2016 at 11:12 AM, Jörn Franke wrote: > I am not sure what you try to achieve here. Can you please tell us what > the goal of the program is.

Re: filling missing values in a sequence

2016-09-18 Thread Jörn Franke
I am not sure what you try to achieve here. Can you please tell us what the goal of the program is. Maybe with some example data? Besides this, I have the feeling that it will fail once it is not used in a single node scenario due to the reference to the global counter variable. Also unclear wh

Re: filling missing values in a sequence

2016-09-18 Thread sudhindra
Hi i have coded something like this , pls tell me how bad it is . package Spark.spark; import java.util.List; import java.util.function.Function; import org.apache.spark.SparkConf; import org.apache.spark.SparkContext; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.Jav

Re: filling missing values in a sequence

2014-05-20 Thread Mohit Jaggi
Xiangrui, Thanks for the pointer. I think it should work...for now I did cook up my own which is similar but on top of spark core APIs. I would suggest moving the sliding window RDD to the core spark library. It seems quite general to me and a cursory look at the code indicates nothing specific to

Re: filling missing values in a sequence

2014-05-19 Thread Xiangrui Meng
Actually there is a sliding method implemented in mllib.rdd.RDDFunctions. Since this is not for general use cases, we didn't include it in spark-core. You can take a look at the implementation there and see whether it fits. -Xiangrui On Mon, May 19, 2014 at 10:06 PM, Mohit Jaggi wrote: > Thanks S

Re: filling missing values in a sequence

2014-05-19 Thread Mohit Jaggi
Thanks Sean. Yes, your solution works :-) I did oversimplify my real problem, which has other parameters that go along with the sequence. On Fri, May 16, 2014 at 3:03 AM, Sean Owen wrote: > Not sure if this is feasible, but this literally does what I think you > are describing: > > sc.paralleli

Re: filling missing values in a sequence

2014-05-16 Thread Sean Owen
Not sure if this is feasible, but this literally does what I think you are describing: sc.parallelize(rdd1.first to rdd1.last) On Tue, May 13, 2014 at 4:56 PM, Mohit Jaggi wrote: > Hi, > I am trying to find a way to fill in missing values in an RDD. The RDD is a > sorted sequence. > For example,

Re: filling missing values in a sequence

2014-05-16 Thread bgawalt
Hello Mohit, I don't think there's a direct way of bleeding elements across partitions. But you could write it yourself relatively succinctly: A) Sort the RDD B) Look at the sorted RDD's partitions with the .mapParititionsWithIndex( ) method. Map each partition to its partition ID, and its maximu