I looked up spark row similarity but i am not sure if it will suit my needs as i want to build my recommender as a java application possibly with an interface.
On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <[email protected]> wrote: > Some references: > > small free book here, which talks about the general idea: > https://www.mapr.com/practical-machine-learning > preso, which talks about mixing actions or other indicators: > http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/ > two blog posts: > http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/ > http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/ > mahout docs: > http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html > > Build Mahout from this source: https://github.com/apache/mahout This will > run stand-alone on a dev machine, then if your data is too big for a single > machine you can run it on a Spark + Hadoop cluster. The data this creates > can be put into a DB or indexed directly by a search engine (Solr or > Elasticsearch). Choose the search engine you want then queries of a user’s > item id history will go there--results will be an ordered list of item ids > to recommend. > > The core piece is the command line job: “mahout spark-itemsimilarity”, > which can parse csv data. The options specify what columns are used for ids. > > Start out simple by looking only at user and item IDs. Then you can add > other cross-cooccurrence indicators for multiple actions later pretty > easily. > > > On Nov 28, 2014, at 12:14 AM, Yash Patel <[email protected]> wrote: > > The mahout + search engine recommender seems what would be best for the > data i have. > > Kindly get back to me at your earliest convenience. > > > > Best Regards, > Yash Patel > > On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <[email protected]> wrote: > > > Mahout has several recommenders so no need to create one from components. > > They all make use of the similarity of preferences between users—that’s > why > > they are in the category of collaborative filtering. > > > > Primary Mahout Recommenders: > > 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs > > for all users. Uses “Mahout IDs" > > 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in > > the data. Sometimes better for small data sets than #1. Uses “Mahout IDs" > > 3) Mahout + search engine: cooccurrence type. Extremely flexible, works > > with multiple actions (multi-modal), works for new users that have some > > history, has a scalable server (from the search engine) but is more > > difficult to integrate than #1 or #2. Uses your own ids and reads csv > files. > > > > The rest of the data seems to apply either to the user or the item and so > > would be used in different ways. #1 an #2 can only use user id and item > id > > but some post recommendation weighting or filtering can be applied. #3 > can > > use multiple attributes in different ways. For instance if category is an > > item attribute you can create two actions, user-pref-for-an-item, and > > user-pref-for-a-category. Assuming you want to recommend an item (not > > category) you can create a cross-ccoccurrence indicator for the second > > action and use the data to make your item recs better. #3 is the only > > methods that supports this. > > > > Pick a recommender and we can help more with data prep. > > > > > > On Nov 26, 2014, at 1:34 PM, Yash Patel <[email protected]> wrote: > > > > Hello everyone, > > > > wow i am quite happy to see so many inputs from people. > > > > I apologize for not providing more details. > > > > Although this is not my complete dataset the fields i have chosen to use > > are: > > > > customer id - numeric > > item id - text > > postal code - text > > item category ´- text > > potential growth - text > > territory - text > > > > > > Basically i was thinking of finding similar users and recommending them > > items that users like them have bought but they haven't. > > > > Although i would very much like to hear your opinions as i am not so > > familiar with clustering,classifiers etc. > > > > I found that mahout takes sequence files converted into vectors but i > > couldn't understand how would i do it on my data specifically and more > > importantly make a recommender system out of it. > > > > Also i am wondering how to combine the importance of a specific customer > > through the potential growth attribute. > > > > > > > > > > > > > > Best Regards, > > Yash Patel > > > > On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <[email protected]> > wrote: > > > >> All very good points but note that spark-itemsimilarity may take the > > input > >> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE> > >> > >> On Nov 26, 2014, at 11:43 AM, parnab kumar <[email protected]> > > wrote: > >> > >> kindly elaborate... your requirements... your dataset fields ...and what > >> you want to recommend to an user... Usually a set of item is recommended > > to > >> an user. In your case what are your items ? > >> > >> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is > > not > >> in this format which will let you use directly the algorithms in Mahout. > >> > >> A little more info from your side will help us to give your the right > >> pointers. > >> > >> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <[email protected]> > >> wrote: > >> > >>> Dear Mahout Team, > >>> > >>> I am a student new to machine learning and i am trying to build a user > >>> based recommender using mahout. > >>> > >>> My dataset is a csv file as an input but it has many fields as text and > > i > >>> understand mahout needs numeric values. > >>> > >>> Can you give me a headstart as to where i should start and what kind of > >>> tools i need to parse the text colummns, > >>> > >>> Also an idea on which classifiers or clustering methods i should use > >> would > >>> be highly appreciated. > >>> > >>> > >>> Best Regards; > >>> Yash Patel > >>> > >> > >> > > > > > >
