Mahout has several recommenders so no need to create one from components. They all make use of the similarity of preferences between users—that’s why they are in the category of collaborative filtering.
Primary Mahout Recommenders: 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs for all users. Uses “Mahout IDs" 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in the data. Sometimes better for small data sets than #1. Uses “Mahout IDs" 3) Mahout + search engine: cooccurrence type. Extremely flexible, works with multiple actions (multi-modal), works for new users that have some history, has a scalable server (from the search engine) but is more difficult to integrate than #1 or #2. Uses your own ids and reads csv files. The rest of the data seems to apply either to the user or the item and so would be used in different ways. #1 an #2 can only use user id and item id but some post recommendation weighting or filtering can be applied. #3 can use multiple attributes in different ways. For instance if category is an item attribute you can create two actions, user-pref-for-an-item, and user-pref-for-a-category. Assuming you want to recommend an item (not category) you can create a cross-ccoccurrence indicator for the second action and use the data to make your item recs better. #3 is the only methods that supports this. Pick a recommender and we can help more with data prep. On Nov 26, 2014, at 1:34 PM, Yash Patel <[email protected]> wrote: Hello everyone, wow i am quite happy to see so many inputs from people. I apologize for not providing more details. Although this is not my complete dataset the fields i have chosen to use are: customer id - numeric item id - text postal code - text item category ´- text potential growth - text territory - text Basically i was thinking of finding similar users and recommending them items that users like them have bought but they haven't. Although i would very much like to hear your opinions as i am not so familiar with clustering,classifiers etc. I found that mahout takes sequence files converted into vectors but i couldn't understand how would i do it on my data specifically and more importantly make a recommender system out of it. Also i am wondering how to combine the importance of a specific customer through the potential growth attribute. Best Regards, Yash Patel On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <[email protected]> wrote: > All very good points but note that spark-itemsimilarity may take the input > directly since you specify column numbers for <UID><ITEMID><PREF_VALUE> > > On Nov 26, 2014, at 11:43 AM, parnab kumar <[email protected]> wrote: > > kindly elaborate... your requirements... your dataset fields ...and what > you want to recommend to an user... Usually a set of item is recommended to > an user. In your case what are your items ? > > The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is not > in this format which will let you use directly the algorithms in Mahout. > > A little more info from your side will help us to give your the right > pointers. > > On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <[email protected]> > wrote: > >> Dear Mahout Team, >> >> I am a student new to machine learning and i am trying to build a user >> based recommender using mahout. >> >> My dataset is a csv file as an input but it has many fields as text and i >> understand mahout needs numeric values. >> >> Can you give me a headstart as to where i should start and what kind of >> tools i need to parse the text colummns, >> >> Also an idea on which classifiers or clustering methods i should use > would >> be highly appreciated. >> >> >> Best Regards; >> Yash Patel >> > >
