The Mahout site is a good starting point for using any of the recommenders.
http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html On Nov 29, 2014, at 1:33 PM, Yash Patel <[email protected]> wrote: Can you give me some more details on the Hadoop mapreduce item-based cooccurrence recommender. Best Regards, Yash Patel On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <[email protected]> wrote: > I built this app with it: https://guide.finderbots.com > > The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes > out of the job it is csv text—therefore language and architecture neutral. > I load the data from spark-itemsimilarity into MongoDB using java. Solr is > set up for full-text indexing and queries using data from MongoDB. The > queries are made to Solr through REST from Ruby UX code. You can replace > any component in this stack with whatever you wish and use whatever > language you are comfortable with. > > Alternatively you could modify the UI of Solr or Elasticsearch—both are in > Java. > > If you use any of the other Mahout recommenders they create all recs for > all known users so you’ll still need to build a way to serve those results. > People often use DBs for this and integrate with their web app framework. > > On Nov 28, 2014, at 10:03 AM, Yash Patel <[email protected]> wrote: > > I looked up spark row similarity but i am not sure if it will suit my needs > as i want to build my recommender as a java application possibly with an > interface. > > > On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <[email protected]> wrote: > >> Some references: >> >> small free book here, which talks about the general idea: >> https://www.mapr.com/practical-machine-learning >> preso, which talks about mixing actions or other indicators: >> > http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ >> two blog posts: >> > http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ >> > http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ >> mahout docs: >> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html >> >> Build Mahout from this source: https://github.com/apache/mahout This > will >> run stand-alone on a dev machine, then if your data is too big for a > single >> machine you can run it on a Spark + Hadoop cluster. The data this creates >> can be put into a DB or indexed directly by a search engine (Solr or >> Elasticsearch). Choose the search engine you want then queries of a > user’s >> item id history will go there--results will be an ordered list of item > ids >> to recommend. >> >> The core piece is the command line job: “mahout spark-itemsimilarity”, >> which can parse csv data. The options specify what columns are used for > ids. >> >> Start out simple by looking only at user and item IDs. Then you can add >> other cross-cooccurrence indicators for multiple actions later pretty >> easily. >> >> >> On Nov 28, 2014, at 12:14 AM, Yash Patel <[email protected]> > wrote: >> >> The mahout + search engine recommender seems what would be best for the >> data i have. >> >> Kindly get back to me at your earliest convenience. >> >> >> >> Best Regards, >> Yash Patel >> >> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <[email protected]> > wrote: >> >>> Mahout has several recommenders so no need to create one from > components. >>> They all make use of the similarity of preferences between users—that’s >> why >>> they are in the category of collaborative filtering. >>> >>> Primary Mahout Recommenders: >>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all > recs >>> for all users. Uses “Mahout IDs" >>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in >>> the data. Sometimes better for small data sets than #1. Uses “Mahout > IDs" >>> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works >>> with multiple actions (multi-modal), works for new users that have some >>> history, has a scalable server (from the search engine) but is more >>> difficult to integrate than #1 or #2. Uses your own ids and reads csv >> files. >>> >>> The rest of the data seems to apply either to the user or the item and > so >>> would be used in different ways. #1 an #2 can only use user id and item >> id >>> but some post recommendation weighting or filtering can be applied. #3 >> can >>> use multiple attributes in different ways. For instance if category is > an >>> item attribute you can create two actions, user-pref-for-an-item, and >>> user-pref-for-a-category. Assuming you want to recommend an item (not >>> category) you can create a cross-ccoccurrence indicator for the second >>> action and use the data to make your item recs better. #3 is the only >>> methods that supports this. >>> >>> Pick a recommender and we can help more with data prep. >>> >>> >>> On Nov 26, 2014, at 1:34 PM, Yash Patel <[email protected]> > wrote: >>> >>> Hello everyone, >>> >>> wow i am quite happy to see so many inputs from people. >>> >>> I apologize for not providing more details. >>> >>> Although this is not my complete dataset the fields i have chosen to use >>> are: >>> >>> customer id - numeric >>> item id - text >>> postal code - text >>> item category ´- text >>> potential growth - text >>> territory - text >>> >>> >>> Basically i was thinking of finding similar users and recommending them >>> items that users like them have bought but they haven't. >>> >>> Although i would very much like to hear your opinions as i am not so >>> familiar with clustering,classifiers etc. >>> >>> I found that mahout takes sequence files converted into vectors but i >>> couldn't understand how would i do it on my data specifically and more >>> importantly make a recommender system out of it. >>> >>> Also i am wondering how to combine the importance of a specific customer >>> through the potential growth attribute. >>> >>> >>> >>> >>> >>> >>> Best Regards, >>> Yash Patel >>> >>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <[email protected]> >> wrote: >>> >>>> All very good points but note that spark-itemsimilarity may take the >>> input >>>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE> >>>> >>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <[email protected]> >>> wrote: >>>> >>>> kindly elaborate... your requirements... your dataset fields ...and > what >>>> you want to recommend to an user... Usually a set of item is > recommended >>> to >>>> an user. In your case what are your items ? >>>> >>>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is >>> not >>>> in this format which will let you use directly the algorithms in > Mahout. >>>> >>>> A little more info from your side will help us to give your the right >>>> pointers. >>>> >>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <[email protected]> >>>> wrote: >>>> >>>>> Dear Mahout Team, >>>>> >>>>> I am a student new to machine learning and i am trying to build a user >>>>> based recommender using mahout. >>>>> >>>>> My dataset is a csv file as an input but it has many fields as text > and >>> i >>>>> understand mahout needs numeric values. >>>>> >>>>> Can you give me a headstart as to where i should start and what kind > of >>>>> tools i need to parse the text colummns, >>>>> >>>>> Also an idea on which classifiers or clustering methods i should use >>>> would >>>>> be highly appreciated. >>>>> >>>>> >>>>> Best Regards; >>>>> Yash Patel >>>>> >>>> >>>> >>> >>> >> >> > >
