Re: Stochastic gradient descent performance

Shivaram Venkataraman Sun, 05 Apr 2015 19:15:09 -0700

Yeah, a simple way to estimate the time for an iterative algorithms is
number of iterations required * time per iteration. The time per iteration
will depend on the batch size, computation required and the fixed overheads
I mentioned before. The number of iterations of course depends on the
convergence rate for the problem being solved.


Thanks
Shivaram

On Thu, Apr 2, 2015 at 2:19 PM, Ulanov, Alexander <alexander.ula...@hp.com>
wrote:

>  Hi Shivaram,
>
>
>
> It sounds really interesting! With this time we can estimate if it worth
> considering to run an iterative algorithm on Spark. For example, for SGD on
> Imagenet (450K samples) we will spend 450K*50ms=62.5 hours to traverse all
> data by one example not considering the data loading, computation and
> update times. One may need to traverse all data a number of times to
> converge. Let’s say this number is equal to the batch size. So, we remain
> with 62.5 hours overhead. Is it reasonable?
>
>
>
> Best regards, Alexander
>
>
>
> *From:* Shivaram Venkataraman [mailto:shiva...@eecs.berkeley.edu]
> *Sent:* Thursday, April 02, 2015 1:26 PM
> *To:* Joseph Bradley
> *Cc:* Ulanov, Alexander; dev@spark.apache.org
> *Subject:* Re: Stochastic gradient descent performance
>
>
>
> I haven't looked closely at the sampling issues, but regarding the
> aggregation latency, there are fixed overheads (in local and distributed
> mode) with the way aggregation is done in Spark. Launching a stage of
> tasks, fetching outputs from the previous stage etc. all have overhead, so
> I would say its not efficient / recommended to run stages where computation
> is less than 500ms or so. You could increase your batch size based on this
> and hopefully that will help.
>
>
>
> Regarding reducing these overheads by an order of magnitude it is a
> challenging problem given the architecture in Spark -- I have some ideas
> for this, but they are very much at a research stage.
>
>
>
> Thanks
> Shivaram
>
>
>
> On Thu, Apr 2, 2015 at 12:00 PM, Joseph Bradley <jos...@databricks.com>
> wrote:
>
> When you say "It seems that instead of sample it is better to shuffle data
> and then access it sequentially by mini-batches," are you sure that holds
> true for a big dataset in a cluster?  As far as implementing it, I haven't
> looked carefully at GapSamplingIterator (in RandomSampler.scala) myself,
> but that looks like it could be modified to be deterministic.
>
> Hopefully someone else can comment on aggregation in local mode.  I'm not
> sure how much effort has gone into optimizing for local mode.
>
> Joseph
>
> On Thu, Apr 2, 2015 at 11:33 AM, Ulanov, Alexander <
> alexander.ula...@hp.com>
> wrote:
>
> >  Hi Joseph,
> >
> >
> >
> > Thank you for suggestion!
> >
> > It seems that instead of sample it is better to shuffle data and then
> > access it sequentially by mini-batches. Could you suggest how to
> implement
> > it?
> >
> >
> >
> > With regards to aggregate (reduce), I am wondering why it works so slow
> in
> > local mode? Could you elaborate on this? I do understand that in cluster
> > mode the network speed will kick in and then one can blame it.
> >
> >
> >
> > Best regards, Alexander
> >
> >
> >
> > *From:* Joseph Bradley [mailto:jos...@databricks.com]
> > *Sent:* Thursday, April 02, 2015 10:51 AM
> > *To:* Ulanov, Alexander
> > *Cc:* dev@spark.apache.org
> > *Subject:* Re: Stochastic gradient descent performance
> >
> >
> >
> > It looks like SPARK-3250 was applied to the sample() which
> GradientDescent
> > uses, and that should kick in for your minibatchFraction <= 0.4.  Based
> on
> > your numbers, aggregation seems like the main issue, though I hesitate to
> > optimize aggregation based on local tests for data sizes that small.
> >
> >
> >
> > The first thing I'd check for is unnecessary object creation, and to
> > profile in a cluster or larger data setting.
> >
> >
> >
> > On Wed, Apr 1, 2015 at 10:09 AM, Ulanov, Alexander <
> > alexander.ula...@hp.com> wrote:
> >
> > Sorry for bothering you again, but I think that it is an important issue
> > for applicability of SGD in Spark MLlib. Could Spark developers please
> > comment on it.
> >
> >
> > -----Original Message-----
> > From: Ulanov, Alexander
> > Sent: Monday, March 30, 2015 5:00 PM
> > To: dev@spark.apache.org
> > Subject: Stochastic gradient descent performance
> >
> > Hi,
> >
> > It seems to me that there is an overhead in "runMiniBatchSGD" function of
> > MLlib's "GradientDescent". In particular, "sample" and "treeAggregate"
> > might take time that is order of magnitude greater than the actual
> gradient
> > computation. In particular, for mnist dataset of 60K instances, minibatch
> > size = 0.001 (i.e. 60 samples) it take 0.15 s to sample and 0.3 to
> > aggregate in local mode with 1 data partition on Core i5 processor. The
> > actual gradient computation takes 0.002 s. I searched through Spark Jira
> > and found that there was recently an update for more efficient sampling
> > (SPARK-3250) that is already included in Spark codebase. Is there a way
> to
> > reduce the sampling time and local treeRedeuce by order of magnitude?
> >
> > Best regards, Alexander
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
> >
> >
>
>
>

Re: Stochastic gradient descent performance

Reply via email to