Mike,
I believe the reason you're seeing near identical performance on the
gradient computations is twofold
1) Gradient computations for GLM models are computationally pretty cheap
from a FLOPs/byte read perspective. They are essentially a BLAS "gemv" call
in the dense case, which is well known to
Hello Devs,
This email concerns some timing results for a treeAggregate in
computing a (stochastic) gradient over an RDD of labelled points, as
is currently done in the MLlib optimization routine for SGD.
In SGD, the underlying RDD is downsampled by a fraction f \in (0,1],
and the subgradients ov
tation options.
Best,
Sim
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14222.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
--
o simulations of nested RDDs.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14195.html
> Sent from the Apache Spark Dev
without resorting to simulations of nested RDDs.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14195.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14194.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.or
e of the former to know what's
worth optimizing.
Thanks,
Sim
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14193.html
Sent from the Apache Spark Developers List mailing list archive
Aniket, yes, I've done the separate file trick. :) Still, I think we can
solve this problem without nested RDDs.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14192.html
Sent from the Apache Spark Developers List ma
sistency always beat
capability & performance in terms of how the mass of developers make
technology choices. I have found no exceptions to this, which is why I
wanted to bring the issue with the RDD API up here.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3
h
> sampleByKeyExact and your problem 2 could be implemented in a few less
> lines
> of code.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116p14148.html
> Sent from the Apache Spark Developers
woPassPairRDD, where certain information for the key
could be provided along with an Iterable e.g. the counts for the key. Both
sampleByKeyExact and your problem 2 could be implemented in a few less lines
of code.
--
View this message in context:
http://apache-spark-developers-list.1001551.n
le to lose all
>> high-level RDD API abstractions the very moment we group an RDD or call
>> mapPartitions? Does the goal of no nested RDDs mean there are absolutely no
>> high-level abstractions that we can expose via the Iterables borne of RDDs?
>>
>> I'd love your th
oal of no nested RDDs mean there are absolutely no
> high-level abstractions that we can expose via the Iterables borne of RDDs?
>
> I'd love your thoughts.
>
> /Sim
> http://linkedin.com/in/simeons
>
> --
> If you reply to this email, your
//linkedin.com/in/simeons <http://linkedin.com/in/simeons>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-API-patterns-tp14116.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.c
no nested RDDs mean there are absolutely no
high-level abstractions that we can expose via the Iterables borne of RDDs?
I'd love your thoughts.
/Sim
http://linkedin.com/in/simeons <http://linkedin.com/in/simeons>
--
View this message in context:
http://apache-spark-developers-lis
15 matches
Mail list logo