Re: Stochastic gradient descent performance

2015-04-06 Thread Reynold Xin
n [mailto:shiva...@eecs.berkeley.edu] > > Sent: Sunday, April 05, 2015 7:13 PM > > To: Ulanov, Alexander > > Cc: shiva...@eecs.berkeley.edu; Joseph Bradley; dev@spark.apache.org > > Subject: Re: Stochastic gradient descent performance > > > > Yeah, a simple wa

Re: Stochastic gradient descent performance

2015-04-06 Thread Xiangrui Meng
hivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu] > Sent: Sunday, April 05, 2015 7:13 PM > To: Ulanov, Alexander > Cc: shiva...@eecs.berkeley.edu; Joseph Bradley; dev@spark.apache.org > Subject: Re: Stochastic gradient descent performance > > Yeah, a simple way to estimate the ti

RE: Stochastic gradient descent performance

2015-04-06 Thread Ulanov, Alexander
g<mailto:dev@spark.apache.org> Subject: Re: Stochastic gradient descent performance I haven't looked closely at the sampling issues, but regarding the aggregation latency, there are fixed overheads (in local and distributed mode) with the way aggregation is done in Spark. Launching a s

Re: Stochastic gradient descent performance

2015-04-05 Thread Shivaram Venkataraman
s? I do understand that in cluster > > mode the network speed will kick in and then one can blame it. > > > > > > > > Best regards, Alexander > > > > > > > > *From:* Joseph Bradley [mailto:jos...@databricks.com] > > *Sent:* Thursday, April 02, 2015

RE: Stochastic gradient descent performance

2015-04-02 Thread Ulanov, Alexander
: Thursday, April 02, 2015 1:26 PM To: Joseph Bradley Cc: Ulanov, Alexander; dev@spark.apache.org Subject: Re: Stochastic gradient descent performance I haven't looked closely at the sampling issues, but regarding the aggregation latency, there are fixed overheads (in local and distributed mode)

Re: Stochastic gradient descent performance

2015-04-02 Thread Shivaram Venkataraman
ering why it works so slow > in > > local mode? Could you elaborate on this? I do understand that in cluster > > mode the network speed will kick in and then one can blame it. > > > > > > > > Best regards, Alexander > > > > > > > &g

Re: Stochastic gradient descent performance

2015-04-02 Thread Joseph Bradley
y, April 02, 2015 10:51 AM > *To:* Ulanov, Alexander > *Cc:* dev@spark.apache.org > *Subject:* Re: Stochastic gradient descent performance > > > > It looks like SPARK-3250 was applied to the sample() which GradientDescent > uses, and that should kick in for your miniba

RE: Stochastic gradient descent performance

2015-04-02 Thread Ulanov, Alexander
mportant issue for applicability of SGD in Spark MLlib. Could Spark developers please comment on it. -Original Message- From: Ulanov, Alexander Sent: Monday, March 30, 2015 5:00 PM To: dev@spark.apache.org<mailto:dev@spark.apache.org> Subject: Stochastic gradient descent performan

Re: Stochastic gradient descent performance

2015-04-02 Thread Joseph Bradley
b. Could Spark developers please > comment on it. > > -Original Message- > From: Ulanov, Alexander > Sent: Monday, March 30, 2015 5:00 PM > To: dev@spark.apache.org > Subject: Stochastic gradient descent performance > > Hi, > > It seems to me that there

RE: Stochastic gradient descent performance

2015-04-01 Thread Ulanov, Alexander
gradient descent performance Hi, It seems to me that there is an overhead in "runMiniBatchSGD" function of MLlib's "GradientDescent". In particular, "sample" and "treeAggregate" might take time that is order of magnitude greater than the actual gr

Stochastic gradient descent performance

2015-03-30 Thread Ulanov, Alexander
Hi, It seems to me that there is an overhead in "runMiniBatchSGD" function of MLlib's "GradientDescent". In particular, "sample" and "treeAggregate" might take time that is order of magnitude greater than the actual gradient computation. In particular, for mnist dataset of 60K instances, miniba