Re: Gradient Descent with large model size

2015-10-19 Thread Mike Hynes
; Sent: Saturday, October 17, 2015 2:24 PM > To: Joseph Bradley > Cc: Ulanov, Alexander; dev@spark.apache.org > Subject: Re: Gradient Descent with large model size > > Yes, remember that your bandwidth is the maximum number of bytes per second > that can be shipped to the d

RE: Gradient Descent with large model size

2015-10-19 Thread Ulanov, Alexander
@spark.apache.org Subject: Re: Gradient Descent with large model size Yes, remember that your bandwidth is the maximum number of bytes per second that can be shipped to the driver. So if you've got 5 blocks that size, then it looks like you're basically saturating the network. Aggregation trees hel

Re: Gradient Descent with large model size

2015-10-17 Thread Evan Sparks
t; >> >> >> I also measured the bandwidth of my network with iperf. It shows 247Mbit/s. >> So the transfer of 12M array of double message should take 64 * >> 12M/247M~3.1s. Does this mean that for 5 nodes with treeaggreate of depth 1 >> it will take 5*3.1~15.

Re: Gradient Descent with large model size

2015-10-17 Thread Joseph Bradley
exander > > *From:* Joseph Bradley [mailto:jos...@databricks.com] > *Sent:* Wednesday, October 14, 2015 11:35 PM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: Gradient Descent with large model size > > > > For those numbers of partitions, I don'

RE: Gradient Descent with large model size

2015-10-15 Thread Ulanov, Alexander
Bradley [mailto:jos...@databricks.com] Sent: Wednesday, October 14, 2015 11:35 PM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Gradient Descent with large model size For those numbers of partitions, I don't think you'll actually use tree aggregation. The number of partition

Re: Gradient Descent with large model size

2015-10-14 Thread Joseph Bradley
For those numbers of partitions, I don't think you'll actually use tree aggregation. The number of partitions needs to be over a certain threshold (>= 7) before treeAggregate really operates on a tree structure: https://github.com/apache/spark/blob/9808052b5adfed7dafd6c1b3971b998e45b2799a/core/src

Gradient Descent with large model size

2015-10-14 Thread Ulanov, Alexander
Dear Spark developers, I have noticed that Gradient Descent is Spark MLlib takes long time if the model is large. It is implemented with TreeAggregate. I've extracted the code from GradientDescent.scala to perform the benchmark. It allocates the Array of a given size and the aggregates it: val