I figured out how to parse csv files and use a map of Userid,item id and build a normal recommender,which gives user a recommendation of some items.
Although this method isn't able to utilize all my data considering its only using two columns. I have multiple different columns such as category,shipping location,item price,online user, etc. How can i use all these different columns and improve recommendation quality(ie.calculate more precise similarity between users by use of location,item price) ? Best Regards, Yash Patel On Sat, Nov 29, 2014 at 10:47 PM, Yash Patel <[email protected]> wrote: > Thank you for the guidance. > > I will try building something rough and ask questions if i run into any > errors. > > > > > On Sat, Nov 29, 2014 at 10:38 PM, Pat Ferrel <[email protected]> > wrote: > >> The Mahout site is a good starting point for using any of the >> recommenders. >> >> http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html >> >> On Nov 29, 2014, at 1:33 PM, Yash Patel <[email protected]> wrote: >> >> Can you give me some more details on the Hadoop mapreduce item-based >> cooccurrence recommender. >> >> >> Best Regards, >> Yash Patel >> >> On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <[email protected]> >> wrote: >> >> > I built this app with it: https://guide.finderbots.com >> > >> > The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes >> > out of the job it is csv text—therefore language and architecture >> neutral. >> > I load the data from spark-itemsimilarity into MongoDB using java. Solr >> is >> > set up for full-text indexing and queries using data from MongoDB. The >> > queries are made to Solr through REST from Ruby UX code. You can replace >> > any component in this stack with whatever you wish and use whatever >> > language you are comfortable with. >> > >> > Alternatively you could modify the UI of Solr or Elasticsearch—both are >> in >> > Java. >> > >> > If you use any of the other Mahout recommenders they create all recs for >> > all known users so you’ll still need to build a way to serve those >> results. >> > People often use DBs for this and integrate with their web app >> framework. >> > >> > On Nov 28, 2014, at 10:03 AM, Yash Patel <[email protected]> >> wrote: >> > >> > I looked up spark row similarity but i am not sure if it will suit my >> needs >> > as i want to build my recommender as a java application possibly with an >> > interface. >> > >> > >> > On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <[email protected]> >> wrote: >> > >> >> Some references: >> >> >> >> small free book here, which talks about the general idea: >> >> https://www.mapr.com/practical-machine-learning >> >> preso, which talks about mixing actions or other indicators: >> >> >> > >> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ >> >> two blog posts: >> >> >> > >> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ >> >> >> > >> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ >> >> mahout docs: >> >> >> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html >> >> >> >> Build Mahout from this source: https://github.com/apache/mahout This >> > will >> >> run stand-alone on a dev machine, then if your data is too big for a >> > single >> >> machine you can run it on a Spark + Hadoop cluster. The data this >> creates >> >> can be put into a DB or indexed directly by a search engine (Solr or >> >> Elasticsearch). Choose the search engine you want then queries of a >> > user’s >> >> item id history will go there--results will be an ordered list of item >> > ids >> >> to recommend. >> >> >> >> The core piece is the command line job: “mahout spark-itemsimilarity”, >> >> which can parse csv data. The options specify what columns are used for >> > ids. >> >> >> >> Start out simple by looking only at user and item IDs. Then you can add >> >> other cross-cooccurrence indicators for multiple actions later pretty >> >> easily. >> >> >> >> >> >> On Nov 28, 2014, at 12:14 AM, Yash Patel <[email protected]> >> > wrote: >> >> >> >> The mahout + search engine recommender seems what would be best for the >> >> data i have. >> >> >> >> Kindly get back to me at your earliest convenience. >> >> >> >> >> >> >> >> Best Regards, >> >> Yash Patel >> >> >> >> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <[email protected]> >> > wrote: >> >> >> >>> Mahout has several recommenders so no need to create one from >> > components. >> >>> They all make use of the similarity of preferences between >> users—that’s >> >> why >> >>> they are in the category of collaborative filtering. >> >>> >> >>> Primary Mahout Recommenders: >> >>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all >> > recs >> >>> for all users. Uses “Mahout IDs" >> >>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise >> in >> >>> the data. Sometimes better for small data sets than #1. Uses “Mahout >> > IDs" >> >>> 3) Mahout + search engine: cooccurrence type. Extremely flexible, >> works >> >>> with multiple actions (multi-modal), works for new users that have >> some >> >>> history, has a scalable server (from the search engine) but is more >> >>> difficult to integrate than #1 or #2. Uses your own ids and reads csv >> >> files. >> >>> >> >>> The rest of the data seems to apply either to the user or the item and >> > so >> >>> would be used in different ways. #1 an #2 can only use user id and >> item >> >> id >> >>> but some post recommendation weighting or filtering can be applied. #3 >> >> can >> >>> use multiple attributes in different ways. For instance if category is >> > an >> >>> item attribute you can create two actions, user-pref-for-an-item, and >> >>> user-pref-for-a-category. Assuming you want to recommend an item (not >> >>> category) you can create a cross-ccoccurrence indicator for the second >> >>> action and use the data to make your item recs better. #3 is the only >> >>> methods that supports this. >> >>> >> >>> Pick a recommender and we can help more with data prep. >> >>> >> >>> >> >>> On Nov 26, 2014, at 1:34 PM, Yash Patel <[email protected]> >> > wrote: >> >>> >> >>> Hello everyone, >> >>> >> >>> wow i am quite happy to see so many inputs from people. >> >>> >> >>> I apologize for not providing more details. >> >>> >> >>> Although this is not my complete dataset the fields i have chosen to >> use >> >>> are: >> >>> >> >>> customer id - numeric >> >>> item id - text >> >>> postal code - text >> >>> item category ´- text >> >>> potential growth - text >> >>> territory - text >> >>> >> >>> >> >>> Basically i was thinking of finding similar users and recommending >> them >> >>> items that users like them have bought but they haven't. >> >>> >> >>> Although i would very much like to hear your opinions as i am not so >> >>> familiar with clustering,classifiers etc. >> >>> >> >>> I found that mahout takes sequence files converted into vectors but i >> >>> couldn't understand how would i do it on my data specifically and more >> >>> importantly make a recommender system out of it. >> >>> >> >>> Also i am wondering how to combine the importance of a specific >> customer >> >>> through the potential growth attribute. >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> Best Regards, >> >>> Yash Patel >> >>> >> >>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <[email protected]> >> >> wrote: >> >>> >> >>>> All very good points but note that spark-itemsimilarity may take the >> >>> input >> >>>> directly since you specify column numbers for >> <UID><ITEMID><PREF_VALUE> >> >>>> >> >>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <[email protected]> >> >>> wrote: >> >>>> >> >>>> kindly elaborate... your requirements... your dataset fields ...and >> > what >> >>>> you want to recommend to an user... Usually a set of item is >> > recommended >> >>> to >> >>>> an user. In your case what are your items ? >> >>>> >> >>>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data >> is >> >>> not >> >>>> in this format which will let you use directly the algorithms in >> > Mahout. >> >>>> >> >>>> A little more info from your side will help us to give your the right >> >>>> pointers. >> >>>> >> >>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <[email protected] >> > >> >>>> wrote: >> >>>> >> >>>>> Dear Mahout Team, >> >>>>> >> >>>>> I am a student new to machine learning and i am trying to build a >> user >> >>>>> based recommender using mahout. >> >>>>> >> >>>>> My dataset is a csv file as an input but it has many fields as text >> > and >> >>> i >> >>>>> understand mahout needs numeric values. >> >>>>> >> >>>>> Can you give me a headstart as to where i should start and what kind >> > of >> >>>>> tools i need to parse the text colummns, >> >>>>> >> >>>>> Also an idea on which classifiers or clustering methods i should use >> >>>> would >> >>>>> be highly appreciated. >> >>>>> >> >>>>> >> >>>>> Best Regards; >> >>>>> Yash Patel >> >>>>> >> >>>> >> >>>> >> >>> >> >>> >> >> >> >> >> > >> > >> >> >
