Re: Removing MAHOUT_LOCAL option

Dmitriy Lyubimov Mon, 21 Mar 2016 13:48:18 -0700

stochastic svd in DSSVD.scala is identical to MR with exception that MR
frankly is using a more numerically stable reordered Givens QR, while the
DSSVD.scala uses a less numerically stable Cholesky QR.


Aside from that, the DrmLike input parameter is fully compatible with hdfs
sequence file input for the MR version.

in Samsara the code would be (I am writing from memory and hopefully spell
everything right)

<imports, implicits omited...>

val drmX = drmDfsRead(path=<hdfs-path>)
val (drmU, drmV, s) = dssvd(drmX, k=..., q=..., ...)  // whatever
paremeters you normally use here

This should do it.
of course you'd run into significant infrastructure migration if you
currently do not have H20 or Spark available and spinning somewhere already.

-d

On Mon, Mar 21, 2016 at 12:57 PM, Mihai Dascalu <[email protected]>
wrote:

> We still have a legacy code that uses for a Stochastic SVD the local
> HADOOP instance directly in a Java desktop application. But if the desire
> is to eliminate it, we’ve been inclining for a while to migrate everything
> to Spark.
>
> Sorry, I’m old school and use MR, plus I’m new to Spark :) Is there an
> easy way to migrate your Spark example into the Java source code so that we
> do not disrupt the overall flow?
>
>
> Have a great evening!
> Mihai
>
> > On 21 Mar 2016, at 19:31, Dmitriy Lyubimov <[email protected]> wrote:
> >
> > my 1 cents (since it is less than 2) is MAHOUT_LOCAL is part of MR legacy
> > packaging. as long as MR is still here (and I would say it needs to be
> > still here, unless it falls in complete disrepair and totally out of sync
> > with even dated mapreduce apis), MAHOUT_LOCAL needs to stay. As soon as
> MR
> > goes, it goes too.
> >
> > maybe we just simply need a separate mahout script for non-legacy things,
> > or factor out legacy related shell things into another script (something
> > like mahout-mr.sh instead of mahout.sh)
> >
> > On Mon, Mar 21, 2016 at 8:45 AM, Suneel Marthi <[email protected]>
> wrote:
> >
> >> Some background on this issue:
> >>
> >> 1.  Now that we support Spark and H2O as back ends since 0.10.0 and
> Flink
> >> coming soon in 0.12.0, its been bloating the size of our release
> artifacts
> >> when pushing releases to Apache mirrors. Hence we were looking at
> pruning
> >> some of the components that have not been used or have been long marked
> >> deprecated and are not being worked on.
> >>
> >> 2.  Since Mahout 0.7 release in June 2012, the project has diverged from
> >> the MiA book even for legacy MapReduce.  Not sure if that's indeed
> helping
> >> onboard new users.
> >>
> >> 3.  Seems like the consensus so far based on the user responses is to
> >> retain the MAHOUT_LOCAL the option, thanks all for your responses.
> >>
> >>
> >> On Mon, Mar 21, 2016 at 11:38 AM, scott cote <[email protected]>
> wrote:
> >>
> >>> one more comment - I understand that it only works for the legacy code.
> >>> Kill it when the legacy code is no longer deprecated, but gone ….
> >>>
> >>> Otherwise - you will shut out people who buy the older mahout books
> (such
> >>> as MIA) which are still good reads, even though the tech is dated.
> >>>
> >>> SCott
> >>>
> >>>> On Mar 21, 2016, at 2:24 AM, David Starina <[email protected]>
> >>> wrote:
> >>>>
> >>>> Anyhow, I'm +1 for removing MAHOUT_LOCAL, but I believe the deprecated
> >>>> MapReduce-based code still makes sense if it is running well on
> Ignite.
> >>>>
> >>>> On Mon, Mar 21, 2016 at 8:20 AM, David Starina <
> >> [email protected]>
> >>>> wrote:
> >>>>
> >>>>> Has anyone tried to run the deprecated MapReduce code on Ignite? Is
> >> the
> >>>>> performance improvement good enough to reconsider leaving those
> >>> algorithms
> >>>>> in Mahout?
> >>>>>
> >>>>> On Mon, Mar 21, 2016 at 12:45 AM, Andrew Musselman <
> >>>>> [email protected]> wrote:
> >>>>>
> >>>>>> Yes I agree; will leave the question open a couple days.
> >>>>>>
> >>>>>> On Sunday, March 20, 2016, Pat Ferrel <[email protected]>
> wrote:
> >>>>>>
> >>>>>>> Maybe a better user question is: How many people are still using
> the
> >>>>>>> deprecated Hadoop code?
> >>>>>>>
> >>>>>>> If the number is small +1 for removal.
> >>>>>>>
> >>>>>>> On Mar 20, 2016, at 11:04 AM, Andrew Musselman <
> >>>>>> [email protected]
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>
> >>>>>>> To clarify, the MAHOUT_LOCAL option only works for legacy Hadoop
> >>>>>>> MapReduce-based jobs which officially became deprecated in 0.10.0.
> >>>>>>>
> >>>>>>> On Sun, Mar 20, 2016 at 10:25 AM, Andrew Musselman <
> >>>>>>> [email protected] <javascript:;>> wrote:
> >>>>>>>
> >>>>>>>> Yes as I understand it.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sunday, March 20, 2016, Pat Ferrel <[email protected]
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>>
> >>>>>>>>> Are we just talking about Hadoop Mapreduce? I thought is was
> >> ignored
> >>>>>>> when
> >>>>>>>>> using Spark.
> >>>>>>>>>
> >>>>>>>>> On Mar 20, 2016, at 8:20 AM, alok tanna <[email protected]
> >>>>>>> <javascript:;>> wrote:
> >>>>>>>>>
> >>>>>>>>> -1 MAHOUT_LOCAL  is very useful for quick POC .
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Alok Tanna
> >>>>>>>>> Sent from my iPhone
> >>>>>>>>>
> >>>>>>>>>> On Mar 20, 2016, at 5:01 AM, Mihai Dascalu <
> >>> [email protected]
> >>>>>>> <javascript:;>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> -1 I still use it for fast deployment and it’s really helpful
> for
> >>>>>> small
> >>>>>>>>> local processing
> >>>>>>>>>>
> >>>>>>>>>> Have a great weekend!
> >>>>>>>>>> Mihai
> >>>>>>>>>>
> >>>>>>>>>>> On 20 Mar 2016, at 06:13, Suneel Marthi <
> >> [email protected]
> >>>>>>> <javascript:;>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> +1 to remove this
> >>>>>>>>>>>
> >>>>>>>>>>> Sent from my iPhone
> >>>>>>>>>>>
> >>>>>>>>>>>> On Mar 20, 2016, at 12:01 AM, Andrew Musselman <
> >>>>>>>>> [email protected] <javascript:;>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> We're discussing removing the MAHOUT_LOCAL option in order to
> >>> trim
> >>>>>>>>> artifact
> >>>>>>>>>>>> sizes.
> >>>>>>>>>>>>
> >>>>>>>>>>>> If you think keeping the option to use MAHOUT_LOCAL for
> testing
> >>>>>> with
> >>>>>>>>> the
> >>>>>>>>>>>> single-node mode of Hadoop is important please let us know. It
> >>>>>> can be
> >>>>>>>>> handy
> >>>>>>>>>>>> for trying things out but it would be nice to ditch the effort
> >>>>>>>>> required to
> >>>>>>>>>>>> maintain it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> See https://issues.apache.org/jira/browse/MAHOUT-1705 for
> more
> >>>>>>>>> context.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>
> >>>
> >>
>
>

Re: Removing MAHOUT_LOCAL option

Reply via email to