xt:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14068.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user
t value for parameter num:
> Numeric[Iterable[Double]]
> .values.stats
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14
alue for parameter num:
> Numeric[Iterable[Double]]
> .values.stats
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14065
t find implicit value for parameter num:
Numeric[Iterable[Double]]
.values.stats
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14065.html
Sent from the Apache Spark User List mailin
culating mean and std dev for Paired RDDs (key, value)?
>
> Now I'm using an approach with ReduceByKey but want to make my code more
> concise and readable.
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Comp
nd readable.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14062.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
es - sum * sum) / n
print("stddev: " + stddev)
stddev
}
I hope that helps
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11334.html
Sent from the Apache Spark User List maili
}
I hope that helps
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11334.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
> <3
>
> -Kris
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11214.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks for the help everyone. I got the mapValues approach working. I will
experiment with the reduceByKey approach later.
<3
-Kris
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11214.html
Sent f
gt; val iterable = x._2
> > var sum = 0.0
> > var count = 0
> > iterable.foreach{ y =>
> > sum = sum + y.foo
> > count = count + 1
> > }
> > val mean = sum/count;
> > // save mean to database...
> > }
>
2
> var sum = 0.0
> var count = 0
> iterable.foreach{ y =>
> sum = sum + y.foo
> count = count + 1
> }
> val mean = sum/count;
> // save mean to database...
> }
>
>
>
>
> --
> View this message in context:
> http
ontext:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11207.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
// do fancy things with the mean and deviation
>>> }
>>>
>>> However, there seems to be no way to convert the iterable into an RDD. Is
>>> there some other technique for doing this? I'm to the point where I'm
>>> considering copying and
some other technique for doing this? I'm to the point where I'm
>> considering copying and pasting the StatCollector class and changing the
>> type from Double to MyClass (or making it generic).
>>
>> Am I going down the wrong path?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
nsidering copying and pasting the StatCollector class and changing the
> type from Double to MyClass (or making it generic).
>
> Am I going down the wrong path?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
You're certainly not iterating on the driver. The Iterable you process
in your function is on the cluster and done in parallel.
On Fri, Aug 1, 2014 at 8:36 PM, Kristopher Kalish wrote:
> The reason I want an RDD is because I'm assuming that iterating the
> individual elements of an RDD on the dri
The reason I want an RDD is because I'm assuming that iterating the
individual elements of an RDD on the driver of the cluster is much slower
than coming up with the mean and standard deviation using a
map-reduce-based algorithm.
I don't know the intimate details of Spark's implementation, but it
Double to MyClass (or making it generic).
Am I going down the wrong path?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
19 matches
Mail list logo