; Sent: Saturday, October 17, 2015 2:24 PM
> To: Joseph Bradley
> Cc: Ulanov, Alexander; dev@spark.apache.org
> Subject: Re: Gradient Descent with large model size
>
> Yes, remember that your bandwidth is the maximum number of bytes per second
> that can be shipped to the d
@spark.apache.org
Subject: Re: Gradient Descent with large model size
Yes, remember that your bandwidth is the maximum number of bytes per second
that can be shipped to the driver. So if you've got 5 blocks that size, then it
looks like you're basically saturating the network.
Aggregation trees hel
t;
>>
>>
>> I also measured the bandwidth of my network with iperf. It shows 247Mbit/s.
>> So the transfer of 12M array of double message should take 64 *
>> 12M/247M~3.1s. Does this mean that for 5 nodes with treeaggreate of depth 1
>> it will take 5*3.1~15.
exander
>
> *From:* Joseph Bradley [mailto:jos...@databricks.com]
> *Sent:* Wednesday, October 14, 2015 11:35 PM
> *To:* Ulanov, Alexander
> *Cc:* dev@spark.apache.org
> *Subject:* Re: Gradient Descent with large model size
>
>
>
> For those numbers of partitions, I don'
Bradley [mailto:jos...@databricks.com]
Sent: Wednesday, October 14, 2015 11:35 PM
To: Ulanov, Alexander
Cc: dev@spark.apache.org
Subject: Re: Gradient Descent with large model size
For those numbers of partitions, I don't think you'll actually use tree
aggregation. The number of partition
For those numbers of partitions, I don't think you'll actually use tree
aggregation. The number of partitions needs to be over a certain threshold
(>= 7) before treeAggregate really operates on a tree structure:
https://github.com/apache/spark/blob/9808052b5adfed7dafd6c1b3971b998e45b2799a/core/src
Dear Spark developers,
I have noticed that Gradient Descent is Spark MLlib takes long time if the
model is large. It is implemented with TreeAggregate. I've extracted the code
from GradientDescent.scala to perform the benchmark. It allocates the Array of
a given size and the aggregates it:
val