Can you give me some more details on the Hadoop mapreduce item-based
cooccurrence recommender.


Best Regards,
Yash Patel

On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <[email protected]> wrote:

> I built this app with it: https://guide.finderbots.com
>
> The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
> out of the job it is csv text—therefore language and architecture neutral.
> I load the data from spark-itemsimilarity into MongoDB using java. Solr is
> set up for full-text indexing and queries using data from MongoDB. The
> queries are made to Solr through REST from Ruby UX code. You can replace
> any component in this stack with whatever you wish and use whatever
> language you are comfortable with.
>
> Alternatively you could modify the UI of Solr or Elasticsearch—both are in
> Java.
>
> If you use any of the other Mahout recommenders they create all recs for
> all known users so you’ll still need to build a way to serve those results.
> People often use DBs for this and integrate with their web app framework.
>
> On Nov 28, 2014, at 10:03 AM, Yash Patel <[email protected]> wrote:
>
> I looked up spark row similarity but i am not sure if it will suit my needs
> as i want to build my recommender as a java application possibly with an
> interface.
>
>
> On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <[email protected]> wrote:
>
> > Some references:
> >
> > small free book here, which talks about the general idea:
> > https://www.mapr.com/practical-machine-learning
> > preso, which talks about mixing actions or other indicators:
> >
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> > two blog posts:
> >
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
> >
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
> > mahout docs:
> > http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
> >
> > Build Mahout from this source: https://github.com/apache/mahout This
> will
> > run stand-alone on a dev machine, then if your data is too big for a
> single
> > machine you can run it on a Spark + Hadoop cluster. The data this creates
> > can be put into a DB or indexed directly by a search engine (Solr or
> > Elasticsearch). Choose the search engine you want then queries of a
> user’s
> > item id history will go there--results will be an ordered list of item
> ids
> > to recommend.
> >
> > The core piece is the command line job: “mahout spark-itemsimilarity”,
> > which can parse csv data. The options specify what columns are used for
> ids.
> >
> > Start out simple by looking only at user and item IDs. Then you can add
> > other cross-cooccurrence indicators for multiple actions later pretty
> > easily.
> >
> >
> > On Nov 28, 2014, at 12:14 AM, Yash Patel <[email protected]>
> wrote:
> >
> > The mahout + search engine recommender seems what would be best for the
> > data i have.
> >
> > Kindly get back to me at your earliest convenience.
> >
> >
> >
> > Best Regards,
> > Yash Patel
> >
> > On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <[email protected]>
> wrote:
> >
> >> Mahout has several recommenders so no need to create one from
> components.
> >> They all make use of the similarity of preferences between users—that’s
> > why
> >> they are in the category of collaborative filtering.
> >>
> >> Primary Mahout Recommenders:
> >> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
> recs
> >> for all users. Uses “Mahout IDs"
> >> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
> >> the data. Sometimes better for small data sets than #1. Uses “Mahout
> IDs"
> >> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
> >> with multiple actions (multi-modal), works for new users that have some
> >> history, has a scalable server (from the search engine) but is more
> >> difficult to integrate than #1 or #2. Uses your own ids and reads csv
> > files.
> >>
> >> The rest of the data seems to apply either to the user or the item and
> so
> >> would be used in different ways. #1 an #2 can only use user id and item
> > id
> >> but some post recommendation weighting or filtering can be applied. #3
> > can
> >> use multiple attributes in different ways. For instance if category is
> an
> >> item attribute you can create two actions, user-pref-for-an-item, and
> >> user-pref-for-a-category. Assuming you want to recommend an item (not
> >> category) you can create a cross-ccoccurrence indicator for the second
> >> action and use the data to make your item recs better. #3 is the only
> >> methods that supports this.
> >>
> >> Pick a recommender and we can help more with data prep.
> >>
> >>
> >> On Nov 26, 2014, at 1:34 PM, Yash Patel <[email protected]>
> wrote:
> >>
> >> Hello everyone,
> >>
> >> wow i am quite happy to see so many inputs from people.
> >>
> >> I apologize for not providing more details.
> >>
> >> Although this is not my complete dataset the fields i have chosen to use
> >> are:
> >>
> >> customer id - numeric
> >> item id - text
> >> postal code - text
> >> item category ´- text
> >> potential growth - text
> >> territory - text
> >>
> >>
> >> Basically i was thinking of finding similar users and recommending them
> >> items that users like them have bought but they haven't.
> >>
> >> Although i would very much like to hear your opinions as i am not so
> >> familiar with clustering,classifiers etc.
> >>
> >> I found that mahout takes sequence files converted into vectors but i
> >> couldn't understand how would i do it on my data specifically and more
> >> importantly make a recommender system out of it.
> >>
> >> Also i am wondering how to combine the importance of a specific customer
> >> through the potential growth attribute.
> >>
> >>
> >>
> >>
> >>
> >>
> >> Best Regards,
> >> Yash Patel
> >>
> >> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <[email protected]>
> > wrote:
> >>
> >>> All very good points but note that spark-itemsimilarity may take the
> >> input
> >>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
> >>>
> >>> On Nov 26, 2014, at 11:43 AM, parnab kumar <[email protected]>
> >> wrote:
> >>>
> >>> kindly elaborate... your requirements... your dataset fields ...and
> what
> >>> you want to recommend to an user... Usually a set of item is
> recommended
> >> to
> >>> an user. In your case what are your items ?
> >>>
> >>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
> >> not
> >>> in this format which will let you use directly the algorithms in
> Mahout.
> >>>
> >>> A little more info from your side will help us to give your the right
> >>> pointers.
> >>>
> >>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <[email protected]>
> >>> wrote:
> >>>
> >>>> Dear Mahout Team,
> >>>>
> >>>> I am a student new to machine learning and i am trying to build a user
> >>>> based recommender using mahout.
> >>>>
> >>>> My dataset is a csv file as an input but it has many fields as text
> and
> >> i
> >>>> understand mahout needs numeric values.
> >>>>
> >>>> Can you give me a headstart as to where i should start and what kind
> of
> >>>> tools i need to parse the text colummns,
> >>>>
> >>>> Also an idea on which classifiers or clustering methods i should use
> >>> would
> >>>> be highly appreciated.
> >>>>
> >>>>
> >>>> Best Regards;
> >>>> Yash Patel
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Reply via email to