Can you give me some more details on the Hadoop mapreduce item-based cooccurrence recommender.
Best Regards, Yash Patel On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <[email protected]> wrote: > I built this app with it: https://guide.finderbots.com > > The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes > out of the job it is csv text—therefore language and architecture neutral. > I load the data from spark-itemsimilarity into MongoDB using java. Solr is > set up for full-text indexing and queries using data from MongoDB. The > queries are made to Solr through REST from Ruby UX code. You can replace > any component in this stack with whatever you wish and use whatever > language you are comfortable with. > > Alternatively you could modify the UI of Solr or Elasticsearch—both are in > Java. > > If you use any of the other Mahout recommenders they create all recs for > all known users so you’ll still need to build a way to serve those results. > People often use DBs for this and integrate with their web app framework. > > On Nov 28, 2014, at 10:03 AM, Yash Patel <[email protected]> wrote: > > I looked up spark row similarity but i am not sure if it will suit my needs > as i want to build my recommender as a java application possibly with an > interface. > > > On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <[email protected]> wrote: > > > Some references: > > > > small free book here, which talks about the general idea: > > https://www.mapr.com/practical-machine-learning > > preso, which talks about mixing actions or other indicators: > > > http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ > > two blog posts: > > > http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ > > > http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ > > mahout docs: > > http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html > > > > Build Mahout from this source: https://github.com/apache/mahout This > will > > run stand-alone on a dev machine, then if your data is too big for a > single > > machine you can run it on a Spark + Hadoop cluster. The data this creates > > can be put into a DB or indexed directly by a search engine (Solr or > > Elasticsearch). Choose the search engine you want then queries of a > user’s > > item id history will go there--results will be an ordered list of item > ids > > to recommend. > > > > The core piece is the command line job: “mahout spark-itemsimilarity”, > > which can parse csv data. The options specify what columns are used for > ids. > > > > Start out simple by looking only at user and item IDs. Then you can add > > other cross-cooccurrence indicators for multiple actions later pretty > > easily. > > > > > > On Nov 28, 2014, at 12:14 AM, Yash Patel <[email protected]> > wrote: > > > > The mahout + search engine recommender seems what would be best for the > > data i have. > > > > Kindly get back to me at your earliest convenience. > > > > > > > > Best Regards, > > Yash Patel > > > > On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <[email protected]> > wrote: > > > >> Mahout has several recommenders so no need to create one from > components. > >> They all make use of the similarity of preferences between users—that’s > > why > >> they are in the category of collaborative filtering. > >> > >> Primary Mahout Recommenders: > >> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all > recs > >> for all users. Uses “Mahout IDs" > >> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in > >> the data. Sometimes better for small data sets than #1. Uses “Mahout > IDs" > >> 3) Mahout + search engine: cooccurrence type. Extremely flexible, works > >> with multiple actions (multi-modal), works for new users that have some > >> history, has a scalable server (from the search engine) but is more > >> difficult to integrate than #1 or #2. Uses your own ids and reads csv > > files. > >> > >> The rest of the data seems to apply either to the user or the item and > so > >> would be used in different ways. #1 an #2 can only use user id and item > > id > >> but some post recommendation weighting or filtering can be applied. #3 > > can > >> use multiple attributes in different ways. For instance if category is > an > >> item attribute you can create two actions, user-pref-for-an-item, and > >> user-pref-for-a-category. Assuming you want to recommend an item (not > >> category) you can create a cross-ccoccurrence indicator for the second > >> action and use the data to make your item recs better. #3 is the only > >> methods that supports this. > >> > >> Pick a recommender and we can help more with data prep. > >> > >> > >> On Nov 26, 2014, at 1:34 PM, Yash Patel <[email protected]> > wrote: > >> > >> Hello everyone, > >> > >> wow i am quite happy to see so many inputs from people. > >> > >> I apologize for not providing more details. > >> > >> Although this is not my complete dataset the fields i have chosen to use > >> are: > >> > >> customer id - numeric > >> item id - text > >> postal code - text > >> item category ´- text > >> potential growth - text > >> territory - text > >> > >> > >> Basically i was thinking of finding similar users and recommending them > >> items that users like them have bought but they haven't. > >> > >> Although i would very much like to hear your opinions as i am not so > >> familiar with clustering,classifiers etc. > >> > >> I found that mahout takes sequence files converted into vectors but i > >> couldn't understand how would i do it on my data specifically and more > >> importantly make a recommender system out of it. > >> > >> Also i am wondering how to combine the importance of a specific customer > >> through the potential growth attribute. > >> > >> > >> > >> > >> > >> > >> Best Regards, > >> Yash Patel > >> > >> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <[email protected]> > > wrote: > >> > >>> All very good points but note that spark-itemsimilarity may take the > >> input > >>> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE> > >>> > >>> On Nov 26, 2014, at 11:43 AM, parnab kumar <[email protected]> > >> wrote: > >>> > >>> kindly elaborate... your requirements... your dataset fields ...and > what > >>> you want to recommend to an user... Usually a set of item is > recommended > >> to > >>> an user. In your case what are your items ? > >>> > >>> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is > >> not > >>> in this format which will let you use directly the algorithms in > Mahout. > >>> > >>> A little more info from your side will help us to give your the right > >>> pointers. > >>> > >>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <[email protected]> > >>> wrote: > >>> > >>>> Dear Mahout Team, > >>>> > >>>> I am a student new to machine learning and i am trying to build a user > >>>> based recommender using mahout. > >>>> > >>>> My dataset is a csv file as an input but it has many fields as text > and > >> i > >>>> understand mahout needs numeric values. > >>>> > >>>> Can you give me a headstart as to where i should start and what kind > of > >>>> tools i need to parse the text colummns, > >>>> > >>>> Also an idea on which classifiers or clustering methods i should use > >>> would > >>>> be highly appreciated. > >>>> > >>>> > >>>> Best Regards; > >>>> Yash Patel > >>>> > >>> > >>> > >> > >> > > > > > >
