I looked up spark row similarity but i am not sure if it will suit my needs
as i want to build my recommender as a java application possibly with an
interface.


On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <[email protected]> wrote:

> Some references:
>
> small free book here, which talks about the general idea:
> https://www.mapr.com/practical-machine-learning
> preso, which talks about mixing actions or other indicators:
> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
> two blog posts:
> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
> mahout docs:
> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
>
> Build Mahout from this source: https://github.com/apache/mahout This will
> run stand-alone on a dev machine, then if your data is too big for a single
> machine you can run it on a Spark + Hadoop cluster. The data this creates
> can be put into a DB or indexed directly by a search engine (Solr or
> Elasticsearch). Choose the search engine you want then queries of a user’s
> item id history will go there--results will be an ordered list of item ids
> to recommend.
>
> The core piece is the command line job: “mahout spark-itemsimilarity”,
> which can parse csv data. The options specify what columns are used for ids.
>
> Start out simple by looking only at user and item IDs. Then you can add
> other cross-cooccurrence indicators for multiple actions later pretty
> easily.
>
>
> On Nov 28, 2014, at 12:14 AM, Yash Patel <[email protected]> wrote:
>
> The mahout + search engine recommender seems what would be best for the
> data i have.
>
> Kindly get back to me at your earliest convenience.
>
>
>
> Best Regards,
> Yash Patel
>
> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <[email protected]> wrote:
>
> > Mahout has several recommenders so no need to create one from components.
> > They all make use of the similarity of preferences between users—that’s
> why
> > they are in the category of collaborative filtering.
> >
> > Primary Mahout Recommenders:
> > 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all recs
> > for all users. Uses “Mahout IDs"
> > 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise in
> > the data. Sometimes better for small data sets than #1. Uses “Mahout IDs"
> > 3) Mahout + search engine: cooccurrence type. Extremely flexible, works
> > with multiple actions (multi-modal), works for new users that have some
> > history, has a scalable server (from the search engine) but is more
> > difficult to integrate than #1 or #2. Uses your own ids and reads csv
> files.
> >
> > The rest of the data seems to apply either to the user or the item and so
> > would be used in different ways. #1 an #2 can only use user id and item
> id
> > but some post recommendation weighting or filtering can be applied. #3
> can
> > use multiple attributes in different ways. For instance if category is an
> > item attribute you can create two actions, user-pref-for-an-item, and
> > user-pref-for-a-category. Assuming you want to recommend an item (not
> > category) you can create a cross-ccoccurrence indicator for the second
> > action and use the data to make your item recs better. #3 is the only
> > methods that supports this.
> >
> > Pick a recommender and we can help more with data prep.
> >
> >
> > On Nov 26, 2014, at 1:34 PM, Yash Patel <[email protected]> wrote:
> >
> > Hello everyone,
> >
> > wow i am quite happy to see so many inputs from people.
> >
> > I apologize for not providing more details.
> >
> > Although this is not my complete dataset the fields i have chosen to use
> > are:
> >
> > customer id - numeric
> > item id - text
> > postal code - text
> > item category ´- text
> > potential growth - text
> > territory - text
> >
> >
> > Basically i was thinking of finding similar users and recommending them
> > items that users like them have bought but they haven't.
> >
> > Although i would very much like to hear your opinions as i am not so
> > familiar with clustering,classifiers etc.
> >
> > I found that mahout takes sequence files converted into vectors but i
> > couldn't understand how would i do it on my data specifically and more
> > importantly make a recommender system out of it.
> >
> > Also i am wondering how to combine the importance of a specific customer
> > through the potential growth attribute.
> >
> >
> >
> >
> >
> >
> > Best Regards,
> > Yash Patel
> >
> > On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <[email protected]>
> wrote:
> >
> >> All very good points but note that spark-itemsimilarity may take the
> > input
> >> directly since you specify column numbers for <UID><ITEMID><PREF_VALUE>
> >>
> >> On Nov 26, 2014, at 11:43 AM, parnab kumar <[email protected]>
> > wrote:
> >>
> >> kindly elaborate... your requirements... your dataset fields ...and what
> >> you want to recommend to an user... Usually a set of item is recommended
> > to
> >> an user. In your case what are your items ?
> >>
> >> The standard input is <UID><ITEMID><PREF_VALUE> . Clearly your data is
> > not
> >> in this format which will let you use directly the algorithms in Mahout.
> >>
> >> A little more info from your side will help us to give your the right
> >> pointers.
> >>
> >> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <[email protected]>
> >> wrote:
> >>
> >>> Dear Mahout Team,
> >>>
> >>> I am a student new to machine learning and i am trying to build a user
> >>> based recommender using mahout.
> >>>
> >>> My dataset is a csv file as an input but it has many fields as text and
> > i
> >>> understand mahout needs numeric values.
> >>>
> >>> Can you give me a headstart as to where i should start and what kind of
> >>> tools i need to parse the text colummns,
> >>>
> >>> Also an idea on which classifiers or clustering methods i should use
> >> would
> >>> be highly appreciated.
> >>>
> >>>
> >>> Best Regards;
> >>> Yash Patel
> >>>
> >>
> >>
> >
> >
>
>

Reply via email to