thanks ayan
On Mon, Sep 19, 2016 at 12:25 PM, ayan guha wrote:
> Let me give you a possible direction, please do not use as it is :)
>
> >>> r = sc.parallelize([1,3,4,6,8,11,12,5],3)
>
> here, I am loading some numbers and partitioning. This partitioning is
> critical. You may just use partition
Let me give you a possible direction, please do not use as it is :)
>>> r = sc.parallelize([1,3,4,6,8,11,12,5],3)
here, I am loading some numbers and partitioning. This partitioning is
critical. You may just use partitioning scheme comes with Spark (like
above) or, use your own through partitionB
that is correct
On Mon, Sep 19, 2016 at 12:09 PM, ayan guha wrote:
> Ok, so if you see
>
> 1,3,4,6.
>
> Will you say 2,5 are missing?
>
> On Mon, Sep 19, 2016 at 4:15 PM, Sudhindra Magadi
> wrote:
>
>> Each of the records will be having a sequence id .No duplicates
>>
>> On Mon, Sep 19, 201
Ok, so if you see
1,3,4,6.
Will you say 2,5 are missing?
On Mon, Sep 19, 2016 at 4:15 PM, Sudhindra Magadi wrote:
> Each of the records will be having a sequence id .No duplicates
>
> On Mon, Sep 19, 2016 at 11:42 AM, ayan guha wrote:
>
>> And how do you define missing sequence? Can you g
Each of the records will be having a sequence id .No duplicates
On Mon, Sep 19, 2016 at 11:42 AM, ayan guha wrote:
> And how do you define missing sequence? Can you give an example?
>
> On Mon, Sep 19, 2016 at 3:48 PM, Sudhindra Magadi
> wrote:
>
>> Hi Jorn ,
>> We have a file with billion rec
And how do you define missing sequence? Can you give an example?
On Mon, Sep 19, 2016 at 3:48 PM, Sudhindra Magadi wrote:
> Hi Jorn ,
> We have a file with billion records.We want to find if there any missing
> sequences here .If so what are they ?
> Thanks
> Sudhindra
>
> On Mon, Sep 19, 2016
Hi Jorn ,
We have a file with billion records.We want to find if there any missing
sequences here .If so what are they ?
Thanks
Sudhindra
On Mon, Sep 19, 2016 at 11:12 AM, Jörn Franke wrote:
> I am not sure what you try to achieve here. Can you please tell us what
> the goal of the program is.
I am not sure what you try to achieve here. Can you please tell us what the
goal of the program is. Maybe with some example data?
Besides this, I have the feeling that it will fail once it is not used in a
single node scenario due to the reference to the global counter variable.
Also unclear wh
Hi i have coded something like this , pls tell me how bad it is .
package Spark.spark;
import java.util.List;
import java.util.function.Function;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.Jav
Xiangrui,
Thanks for the pointer. I think it should work...for now I did cook up my
own which is similar but on top of spark core APIs. I would suggest moving
the sliding window RDD to the core spark library. It seems quite general to
me and a cursory look at the code indicates nothing specific to
Actually there is a sliding method implemented in
mllib.rdd.RDDFunctions. Since this is not for general use cases, we
didn't include it in spark-core. You can take a look at the
implementation there and see whether it fits. -Xiangrui
On Mon, May 19, 2014 at 10:06 PM, Mohit Jaggi wrote:
> Thanks S
Thanks Sean. Yes, your solution works :-) I did oversimplify my real
problem, which has other parameters that go along with the sequence.
On Fri, May 16, 2014 at 3:03 AM, Sean Owen wrote:
> Not sure if this is feasible, but this literally does what I think you
> are describing:
>
> sc.paralleli
Not sure if this is feasible, but this literally does what I think you
are describing:
sc.parallelize(rdd1.first to rdd1.last)
On Tue, May 13, 2014 at 4:56 PM, Mohit Jaggi wrote:
> Hi,
> I am trying to find a way to fill in missing values in an RDD. The RDD is a
> sorted sequence.
> For example,
Hello Mohit,
I don't think there's a direct way of bleeding elements across partitions.
But you could write it yourself relatively succinctly:
A) Sort the RDD
B) Look at the sorted RDD's partitions with the .mapParititionsWithIndex( )
method. Map each partition to its partition ID, and its maximu
14 matches
Mail list logo