Factorization problems are non-convex and so both ALS and DSGD will
converge to local minima and it is not clear which minima will be better
than the other until we run both the algorithms and see...

So I will still say get a DSGD version running in the test setup while you
experiment with the Spark ALS...so that you can see if on your particular
dataset DSGD is converging to a better minima...

If you want I can put the DSGD code base that I used for experimentation on
github...I am not sure if Professor Re already put it on github...


On Sat, Jun 28, 2014 at 2:46 AM, Krakna H <shankark+...@gmail.com> wrote:

> Hi Deb,
>
> Thanks so much for your response! At this point, we haven't determined
> which of DSGD/ALS to go with and were waiting on guidance like yours to
> tell us what the right option would be. It looks like ALS seems to be good
> enough for our purposes.
>
> Regards.
>
>
> On Fri, Jun 27, 2014 at 12:47 PM, Debasish Das [via Apache Spark Developers
> List] <ml-node+s1001551n7098...@n3.nabble.com> wrote:
>
> > Hi,
> >
> > In my experiments with Jellyfish I did not see any substantial RMSE loss
> > over DSGD for Netflix dataset...
> >
> > So we decided to stick with ALS and implemented a family of Quadratic
> > Minimization solvers that stays in the ALS realm but can solve
> interesting
> > constraints(positivity, bounds, L1, equality constrained bounds etc)...We
> > are going to show it at the Spark Summit...Also ALS structure is
> favorable
> > to matrix factorization use-cases where missing entries means zero and
> you
> > want to compute a global gram matrix using broadcast and use that for
> each
> > Quadratic Minimization for all users/products...
> >
> > Implementing DSGD in the data partitioning that Spark ALS uses will be
> > straightforward but I would be more keen to see a dataset where DSGD is
> > showing you better RMSEs than ALS....
> >
> > If you have a dataset where DSGD produces much better result could you
> > please point it to us ?
> >
> > Also you can use Jellyfish to run DSGD benchmarks to compare against
> > ALS...It is multithreaded and if you have good RAM, you should be able to
> > run fairly large datasets...
> >
> > Be careful about the default Jellyfish...it has been tuned for netflix
> > dataset (regularization, rating normalization etc)...So before you
> compare
> > RMSE make sure ALS and Jellyfish is running same algorithm (L2
> regularized
> > Quadratic Loss)....
> >
> > Thanks.
> > Deb
> >
> >
> > On Fri, Jun 27, 2014 at 3:40 AM, Krakna H <[hidden email]
> > <http://user/SendEmail.jtp?type=node&node=7098&i=0>> wrote:
> >
> > > Hi all,
> > >
> > > Just found this thread -- is there an update on including DSGD in
> Spark?
> > We
> > > have a project that entails topic modeling on a document-term matrix
> > using
> > > matrix factorization, and were wondering if we should use ALS or
> attempt
> > > writing our own matrix factorization implementation on top of Spark.
> > >
> > > Thanks.
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7097.html
> > > Sent from the Apache Spark Developers List mailing list archive at
> > > Nabble.com.
> > >
> >
> >
> > ------------------------------
> >  If you reply to this email, your message will be added to the discussion
> > below:
> >
> >
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7098.html
> >  To start a new topic under Apache Spark Developers List, email
> > ml-node+s1001551n1...@n3.nabble.com
> > To unsubscribe from Apache Spark Developers List, click here
> > <
> http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=c2hhbmthcmsrc3lzQGdtYWlsLmNvbXwxfDk3NjU5Mzg0
> >
> > .
> > NAML
> > <
> http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
> >
> >
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-Matrix-Factorization-tp55p7109.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>

Reply via email to