I'm doing the timer in runMiniBatchSGD after  val numExamples = data.count()

See the following. Running rcv1 dataset now, and will update soon.

    val startTime = System.nanoTime()
    for (i <- 1 to numIterations) {
      // Sample a subset (fraction miniBatchFraction) of the total data
      // compute and sum up the subgradients on this subset (this is one
map-reduce)
      val (gradientSum, lossSum) = data.sample(false, miniBatchFraction, 42
+ i)
        .aggregate((BDV.zeros[Double](weights.size), 0.0))(
          seqOp = (c, v) => (c, v) match { case ((grad, loss), (label,
features)) =>
            val l = gradient.compute(features, label, weights,
Vectors.fromBreeze(grad))
            (grad, loss + l)
          },
          combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1),
(grad2, loss2)) =>
            (grad1 += grad2, loss1 + loss2)
          })

      /**
       * NOTE(Xinghao): lossSum is computed using the weights from the
previous iteration
       * and regVal is the regularization value computed in the previous
iteration as well.
       */
      stochasticLossHistory.append(lossSum / miniBatchSize + regVal)
      val update = updater.compute(
        weights, Vectors.fromBreeze(gradientSum / miniBatchSize), stepSize,
i, regParam)
      weights = update._1
      regVal = update._2
      timeStamp.append(System.nanoTime() - startTime)
    }






Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Thu, Apr 24, 2014 at 1:44 PM, Xiangrui Meng <men...@gmail.com> wrote:

> I don't understand why sparse falls behind dense so much at the very
> first iteration. I didn't see count() is called in
>
> https://github.com/dbtsai/spark-lbfgs-benchmark/blob/master/src/main/scala/org/apache/spark/mllib/benchmark/BinaryLogisticRegression.scala
> . Maybe you have local uncommitted changes.
>
> Best,
> Xiangrui
>
> On Thu, Apr 24, 2014 at 11:26 AM, DB Tsai <dbt...@stanford.edu> wrote:
> > Hi Xiangrui,
> >
> > Yes, I'm using yarn-cluster mode, and I did check # of executors I
> specified
> > are the same as the actual running executors.
> >
> > For caching and materialization, I've the timer in optimizer after
> calling
> > count(); as a result, the time for materialization in cache isn't in the
> > benchmark.
> >
> > The difference you saw is actually from dense feature or sparse feature
> > vector. For LBFGS and GD dense feature, you can see the first iteration
> > takes the same time. It's true for GD.
> >
> > I'm going to run rcv1.binary which only has 0.15% non-zero elements to
> > verify the hypothesis.
> >
> >
> > Sincerely,
> >
> > DB Tsai
> > -------------------------------------------------------
> > My Blog: https://www.dbtsai.com
> > LinkedIn: https://www.linkedin.com/in/dbtsai
> >
> >
> > On Thu, Apr 24, 2014 at 1:09 AM, Xiangrui Meng <men...@gmail.com> wrote:
> >>
> >> Hi DB,
> >>
> >> I saw you are using yarn-cluster mode for the benchmark. I tested the
> >> yarn-cluster mode and found that YARN does not always give you the
> >> exact number of executors requested. Just want to confirm that you've
> >> checked the number of executors.
> >>
> >> The second thing to check is that in the benchmark code, after you
> >> call cache, you should also call count() to materialize the RDD. I saw
> >> in the result, the real difference is actually at the first step.
> >> Adding intercept is not a cheap operation for sparse vectors.
> >>
> >> Best,
> >> Xiangrui
> >>
> >> On Thu, Apr 24, 2014 at 12:53 AM, Xiangrui Meng <men...@gmail.com>
> wrote:
> >> > I don't think it is easy to make sparse faster than dense with this
> >> > sparsity and feature dimension. You can try rcv1.binary, which should
> >> > show the difference easily.
> >> >
> >> > David, the breeze operators used here are
> >> >
> >> > 1. DenseVector dot SparseVector
> >> > 2. axpy DenseVector SparseVector
> >> >
> >> > However, the SparseVector is passed in as Vector[Double] instead of
> >> > SparseVector[Double]. It might use the axpy impl of [DenseVector,
> >> > Vector] and call activeIterator. I didn't check whether you used
> >> > multimethods on axpy.
> >> >
> >> > Best,
> >> > Xiangrui
> >> >
> >> > On Wed, Apr 23, 2014 at 10:35 PM, DB Tsai <dbt...@stanford.edu>
> wrote:
> >> >> The figure showing the Log-Likelihood vs Time can be found here.
> >> >>
> >> >>
> >> >>
> https://github.com/dbtsai/spark-lbfgs-benchmark/raw/fd703303fb1c16ef5714901739154728550becf4/result/a9a11M.pdf
> >> >>
> >> >> Let me know if you can not open it. Thanks.
> >> >>
> >> >> Sincerely,
> >> >>
> >> >> DB Tsai
> >> >> -------------------------------------------------------
> >> >> My Blog: https://www.dbtsai.com
> >> >> LinkedIn: https://www.linkedin.com/in/dbtsai
> >> >>
> >> >>
> >> >> On Wed, Apr 23, 2014 at 9:34 PM, Shivaram Venkataraman
> >> >> <shiva...@eecs.berkeley.edu> wrote:
> >> >>> I don't think the attachment came through in the list. Could you
> >> >>> upload the
> >> >>> results somewhere and link to them ?
> >> >>>
> >> >>>
> >> >>> On Wed, Apr 23, 2014 at 9:32 PM, DB Tsai <dbt...@dbtsai.com> wrote:
> >> >>>>
> >> >>>> 123 features per rows, and in average, 89% are zeros.
> >> >>>> On Apr 23, 2014 9:31 PM, "Evan Sparks" <evan.spa...@gmail.com>
> wrote:
> >> >>>>
> >> >>>> > What is the number of non zeroes per row (and number of features)
> >> >>>> > in the
> >> >>>> > sparse case? We've hit some issues with breeze sparse support in
> >> >>>> > the
> >> >>>> > past
> >> >>>> > but for sufficiently sparse data it's still pretty good.
> >> >>>> >
> >> >>>> > > On Apr 23, 2014, at 9:21 PM, DB Tsai <dbt...@stanford.edu>
> wrote:
> >> >>>> > >
> >> >>>> > > Hi all,
> >> >>>> > >
> >> >>>> > > I'm benchmarking Logistic Regression in MLlib using the newly
> >> >>>> > > added
> >> >>>> > optimizer LBFGS and GD. I'm using the same dataset and the same
> >> >>>> > methodology
> >> >>>> > in this paper, http://www.csie.ntu.edu.tw/~cjlin/papers/l1.pdf
> >> >>>> > >
> >> >>>> > > I want to know how Spark scale while adding workers, and how
> >> >>>> > > optimizers
> >> >>>> > and input format (sparse or dense) impact performance.
> >> >>>> > >
> >> >>>> > > The benchmark code can be found here,
> >> >>>> > https://github.com/dbtsai/spark-lbfgs-benchmark
> >> >>>> > >
> >> >>>> > > The first dataset I benchmarked is a9a which only has 2.2MB. I
> >> >>>> > duplicated the dataset, and made it 762MB to have 11M rows. This
> >> >>>> > dataset
> >> >>>> > has 123 features and 11% of the data are non-zero elements.
> >> >>>> > >
> >> >>>> > > In this benchmark, all the dataset is cached in memory.
> >> >>>> > >
> >> >>>> > > As we expect, LBFGS converges faster than GD, and at some
> point,
> >> >>>> > > no
> >> >>>> > matter how we push GD, it will converge slower and slower.
> >> >>>> > >
> >> >>>> > > However, it's surprising that sparse format runs slower than
> >> >>>> > > dense
> >> >>>> > format. I did see that sparse format takes significantly smaller
> >> >>>> > amount
> >> >>>> > of
> >> >>>> > memory in caching RDD, but sparse is 40% slower than dense. I
> think
> >> >>>> > sparse
> >> >>>> > should be fast since when we compute x wT, since x is sparse, we
> >> >>>> > can do
> >> >>>> > it
> >> >>>> > faster. I wonder if there is anything I'm doing wrong.
> >> >>>> > >
> >> >>>> > > The attachment is the benchmark result.
> >> >>>> > >
> >> >>>> > > Thanks.
> >> >>>> > >
> >> >>>> > > Sincerely,
> >> >>>> > >
> >> >>>> > > DB Tsai
> >> >>>> > > -------------------------------------------------------
> >> >>>> > > My Blog: https://www.dbtsai.com
> >> >>>> > > LinkedIn: https://www.linkedin.com/in/dbtsai
> >> >>>> >
> >> >>>
> >> >>>
> >
> >
>

Reply via email to