Re: RDD API patterns

2015-09-26 Thread Evan R. Sparks
Mike, I believe the reason you're seeing near identical performance on the gradient computations is twofold 1) Gradient computations for GLM models are computationally pretty cheap from a FLOPs/byte read perspective. They are essentially a BLAS "gemv" call in the dense case, which is well known to

Re: RDD API patterns

2015-09-26 Thread Mike Hynes
Hello Devs, This email concerns some timing results for a treeAggregate in computing a (stochastic) gradient over an RDD of labelled points, as is currently done in the MLlib optimization routine for SGD. In SGD, the underlying RDD is downsampled by a fraction f \in (0,1], and the subgradients ov

Re: RDD API patterns

2015-09-19 Thread sim
tation options. Best, Sim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14222.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --

Re: RDD API patterns

2015-09-19 Thread Juan Rodríguez Hortalá
o simulations of nested RDDs. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14195.html > Sent from the Apache Spark Dev

Re: RDD API patterns

2015-09-18 Thread sim
without resorting to simulations of nested RDDs. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14195.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: RDD API patterns

2015-09-18 Thread sim
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14194.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.or

Re: RDD API patterns

2015-09-18 Thread sim
e of the former to know what's worth optimizing. Thanks, Sim -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14193.html Sent from the Apache Spark Developers List mailing list archive

Re: RDD API patterns

2015-09-18 Thread sim
Aniket, yes, I've done the separate file trick. :) Still, I think we can solve this problem without nested RDDs. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14192.html Sent from the Apache Spark Developers List ma

Re: RDD API patterns

2015-09-18 Thread sim
sistency always beat capability & performance in terms of how the mass of developers make technology choices. I have found no exceptions to this, which is why I wanted to bring the issue with the RDD API up here. -- View this message in context: http://apache-spark-developers-list.1001551.n3

Re: RDD API patterns

2015-09-17 Thread Debasish Das
h > sampleByKeyExact and your problem 2 could be implemented in a few less > lines > of code. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14148.html > Sent from the Apache Spark Developers

Re: RDD API patterns

2015-09-16 Thread robineast
woPassPairRDD, where certain information for the key could be provided along with an Iterable e.g. the counts for the key. Both sampleByKeyExact and your problem 2 could be implemented in a few less lines of code. -- View this message in context: http://apache-spark-developers-list.1001551.n

Re: RDD API patterns

2015-09-16 Thread Juan Rodríguez Hortalá
le to lose all >> high-level RDD API abstractions the very moment we group an RDD or call >> mapPartitions? Does the goal of no nested RDDs mean there are absolutely no >> high-level abstractions that we can expose via the Iterables borne of RDDs? >> >> I'd love your th

Re: RDD API patterns

2015-09-16 Thread Aniket
oal of no nested RDDs mean there are absolutely no > high-level abstractions that we can expose via the Iterables borne of RDDs? > > I'd love your thoughts. > > /Sim > http://linkedin.com/in/simeons > > -- > If you reply to this email, your

Re: RDD API patterns

2015-09-16 Thread Reynold Xin
//linkedin.com/in/simeons <http://linkedin.com/in/simeons> > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.c

RDD API patterns

2015-09-14 Thread sim
no nested RDDs mean there are absolutely no high-level abstractions that we can expose via the Iterables borne of RDDs? I'd love your thoughts. /Sim http://linkedin.com/in/simeons <http://linkedin.com/in/simeons> -- View this message in context: http://apache-spark-developers-lis