Re: Gradient Descent with large model size

Joseph Bradley Wed, 14 Oct 2015 23:36:07 -0700

For those numbers of partitions, I don't think you'll actually use tree
aggregation.  The number of partitions needs to be over a certain threshold
(>= 7) before treeAggregate really operates on a tree structure:
https://github.com/apache/spark/blob/9808052b5adfed7dafd6c1b3971b998e45b2799a/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1100


Do you see a slower increase in running time with more partitions?  For 5
partitions, do you find things improve if you tell treeAggregate to use
depth > 2?

Joseph

On Wed, Oct 14, 2015 at 1:18 PM, Ulanov, Alexander <[email protected]
> wrote:

> Dear Spark developers,
>
>
>
> I have noticed that Gradient Descent is Spark MLlib takes long time if the
> model is large. It is implemented with TreeAggregate. I’ve extracted the
> code from GradientDescent.scala to perform the benchmark. It allocates the
> Array of a given size and the aggregates it:
>
>
>
> val dataSize = 12000000
>
> val n = 5
>
> val maxIterations = 3
>
> val rdd = sc.parallelize(0 until n, n).cache()
>
> rdd.count()
>
> var avgTime = 0.0
>
> for (i <- 1 to maxIterations) {
>
>   val start = System.nanoTime()
>
>   val result = rdd.treeAggregate((new Array[Double](dataSize), 0.0, 0L))(
>
>         seqOp = (c, v) => {
>
>           // c: (grad, loss, count)
>
>           val l = 0.0
>
>           (c._1, c._2 + l, c._3 + 1)
>
>         },
>
>         combOp = (c1, c2) => {
>
>           // c: (grad, loss, count)
>
>           (c1._1, c1._2 + c2._2, c1._3 + c2._3)
>
>         })
>
>   avgTime += (System.nanoTime() - start) / 1e9
>
>   assert(result._1.length == dataSize)
>
> }
>
> println("Avg time: " + avgTime / maxIterations)
>
>
>
> If I run on my cluster of 1 master and 5 workers, I get the following
> results (given the array size = 12M):
>
> n = 1: Avg time: 4.555709667333333
>
> n = 2 Avg time: 7.059724584666667
>
> n = 3 Avg time: 9.937117377666667
>
> n = 4 Avg time: 12.687526233
>
> n = 5 Avg time: 12.939526129666667
>
>
>
> Could you explain why the time becomes so big? The data transfer of 12M
> array of double should take ~ 1 second in 1Gbit network. There might be
> other overheads, however not that big as I observe.
>
> Best regards, Alexander
>

Re: Gradient Descent with large model size

Reply via email to