Re: Computing mean and standard deviation by key

2014-09-12 Thread rzykov
xt: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14068.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: Computing mean and standard deviation by key

2014-09-12 Thread David Rowe
t value for parameter num: > Numeric[Iterable[Double]] > .values.stats > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14

Re: Computing mean and standard deviation by key

2014-09-12 Thread Sean Owen
alue for parameter num: > Numeric[Iterable[Double]] > .values.stats > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14065

Re: Computing mean and standard deviation by key

2014-09-12 Thread rzykov
t find implicit value for parameter num: Numeric[Iterable[Double]] .values.stats -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14065.html Sent from the Apache Spark User List mailin

Re: Computing mean and standard deviation by key

2014-09-11 Thread David Rowe
culating mean and std dev for Paired RDDs (key, value)? > > Now I'm using an approach with ReduceByKey but want to make my code more > concise and readable. > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Comp

Re: Computing mean and standard deviation by key

2014-09-11 Thread rzykov
nd readable. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p14062.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Computing mean and standard deviation by key

2014-08-04 Thread Ron Gonzalez
es - sum * sum) / n           print("stddev: " + stddev)           stddev         } I hope that helps -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11334.html Sent from the Apache Spark User List maili

Re: Computing mean and standard deviation by key

2014-08-04 Thread kriskalish
} I hope that helps -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11334.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: Computing mean and standard deviation by key

2014-08-01 Thread Ron Gonzalez
> <3 > > -Kris > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11214.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Computing mean and standard deviation by key

2014-08-01 Thread kriskalish
Thanks for the help everyone. I got the mapValues approach working. I will experiment with the reduceByKey approach later. <3 -Kris -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11214.html Sent f

Re: Computing mean and standard deviation by key

2014-08-01 Thread Evan R. Sparks
gt; val iterable = x._2 > > var sum = 0.0 > > var count = 0 > > iterable.foreach{ y => > > sum = sum + y.foo > > count = count + 1 > > } > > val mean = sum/count; > > // save mean to database... > > } >

Re: Computing mean and standard deviation by key

2014-08-01 Thread Sean Owen
2 > var sum = 0.0 > var count = 0 > iterable.foreach{ y => > sum = sum + y.foo > count = count + 1 > } > val mean = sum/count; > // save mean to database... > } > > > > > -- > View this message in context: > http

Re: Computing mean and standard deviation by key

2014-08-01 Thread kriskalish
ontext: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192p11207.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Computing mean and standard deviation by key

2014-08-01 Thread Evan R. Sparks
// do fancy things with the mean and deviation >>> } >>> >>> However, there seems to be no way to convert the iterable into an RDD. Is >>> there some other technique for doing this? I'm to the point where I'm >>> considering copying and

Re: Computing mean and standard deviation by key

2014-08-01 Thread Xu (Simon) Chen
some other technique for doing this? I'm to the point where I'm >> considering copying and pasting the StatCollector class and changing the >> type from Double to MyClass (or making it generic). >> >> Am I going down the wrong path? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >

Re: Computing mean and standard deviation by key

2014-08-01 Thread Xu (Simon) Chen
nsidering copying and pasting the StatCollector class and changing the > type from Double to MyClass (or making it generic). > > Am I going down the wrong path? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: Computing mean and standard deviation by key

2014-08-01 Thread Sean Owen
You're certainly not iterating on the driver. The Iterable you process in your function is on the cluster and done in parallel. On Fri, Aug 1, 2014 at 8:36 PM, Kristopher Kalish wrote: > The reason I want an RDD is because I'm assuming that iterating the > individual elements of an RDD on the dri

Re: Computing mean and standard deviation by key

2014-08-01 Thread Kristopher Kalish
The reason I want an RDD is because I'm assuming that iterating the individual elements of an RDD on the driver of the cluster is much slower than coming up with the mean and standard deviation using a map-reduce-based algorithm. I don't know the intimate details of Spark's implementation, but it

Computing mean and standard deviation by key

2014-08-01 Thread kriskalish
Double to MyClass (or making it generic). Am I going down the wrong path? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Computing-mean-and-standard-deviation-by-key-tp11192.html Sent from the Apache Spark User List mailing list archive at Nabble.com.