You always will have a "cold start" problem for a subset of users--the new ones to a site. Popularity doesn't always work either. Sometimes you have a flat purchase frequency distribution, as I've seen. In these cases a metadata or content based recommender is nice to fill in. If you have no metadata you still have items similarities (based on older users purchases and views).
I think one important thing to think about is that you don't always need to have recommendations based on the user's history. You may find that you get better results by using item similarity based recommendations. So on an item page you can show recommendations with the above techniques in a wide variety of situations. On another subject looking at the predictive power of views (for purchases) and purchases (for purchases) you will likely find views a weak predictor. I think what Ted is talking about below is a technique for using a co-occurrence matrix to find views that lead to purchases. To use this you would build two models, one from purchases and one from the co-ocurrence of views with purchases. Then you will need to combine the weights of recommendations from both models for a given user history OR similarities for a given item. The conversation Johannes sites below has some details http://markmail.org/message/5cfewal3oyt6vw2k I have a working cross-recommender made for using views and purchases. The next question is how how to measure its performance. There are ways to simulate the view-purchase data and other uses for the cross-recommender technique. But having a real view and purchase dataset would be incredibly useful! I keep begging people on this list... Can you share your data? If so I'd be happy to share the code (actually I'll put it on github eventually). On May 6, 2013, at 9:40 PM, Johannes Schulte <johannes.schu...@gmail.com> wrote: Hi! As a starting point I remember this conversation containing both elements (although the reconstruction part is rather small, hint!) http://markmail.org/message/5cfewal3oyt6vw2k On Tue, May 7, 2013 at 1:00 AM, Dominik Hübner <cont...@dhuebner.com> wrote: > One more thing for now @Ted: > What do you refer to with sparsification and reconstruction? > > On May 7, 2013, at 12:19 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: > >> Truly cold start is best handled by recommending the most popular items. >> >> If you know *anything* at all such as geo or browser or OS, then you can >> use that to recommend using conventional techniques (that is, you can >> recommend for the characteristics rather than for the person). >> >> Within a very few interactions, however, real recommendations will kick > in. >> >> My lately preferred approach is to derive indicators using sparsification >> or ALS+reconstruction. These indicators can be historical items or > static >> items such as geo information. These indicators can be combined in a >> single step using a search engine. >> >> >> >> >> >> >> On Mon, May 6, 2013 at 2:58 PM, Dominik Hübner <cont...@dhuebner.com> > wrote: >> >>> The cluster was mostly intended for tackling the cold start problem for >>> new users. >>> I want to build a recommender based on existing components or to be >>> precise a combination of them. >>> >>> Unfortunately, the only product meta-data I currently have is the > product >>> price. Furthermore, this is a project >>> I am working on alone. As a consequence, the approaches I can examine in >>> the given time are limited. >>> >>> Would using ALS and ranking its outcome by e.g. frequent item set >>> algorithms be something worth looking into? >>> Or did you mean something different? >>> >>> My personal goal is to build a recommender providing acceptable results >>> using the data I currently have available. >>> Of course, this will only serve as a basis for further improvements > where >>> necessary or if further information can be obtained. >>> >>> >>> On May 6, 2013, at 11:21 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: >>> >>>> Are you looking to build a product recommender based on your own > design? >>>> Or do you want to build one based on existing methods? >>>> >>>> If you want to use existing methods, clustering has essentially no > role. >>>> >>>> I think that composite approaches that use item meta-data and different >>>> kinds of behavioral cues are important to best performance. >>>> >>>> >>>> On Mon, May 6, 2013 at 12:35 PM, Dominik Hübner <cont...@dhuebner.com >>>> wrote: >>>> >>>>> Well, as you already might have guessed, I am building a product >>>>> recommender system for my thesis. >>>>> >>>>> I am planning to evaluate ALS (both, implicit and explicit) as well as >>>>> item -similarity recommendation for users with at least a few known >>>>> products. Nevertheless, the majority of users only has seen a single > (or >>>>> 2-3) product(s). I want to recommend them the most popular items from >>>>> clusters, their only product comes from (as a workaround for the >>> cold-start >>>>> problem). Furthermore, I expect to be able to see which "kind" of >>> products >>>>> users like. This might provide me some information about how well ALS >>> and >>>>> similarity recommenders fit the user's area of interest (an early >>>>> evaluation) or at least to estimate if the chosen approach will work > in >>>>> some way. >>>>> >>>>> On May 6, 2013, at 9:09 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: >>>>> >>>>>> I don't even think that clustering is all that necessary. >>>>>> >>>>>> The reduced cooccurrence matrix will give you items related to each >>> item. >>>>>> >>>>>> You can use something like PCA, but SVD is just as good here due to >>> near >>>>>> zero mean. You could SSVD or ALS from Mahout to do this analysis and >>>>> then >>>>>> use k-means on the right singular vectors (aka item representation). >>>>>> >>>>>> What is the high level goal that you are trying to solve with this >>>>>> clustering? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner < > cont...@dhuebner.com >>>>>> wrote: >>>>>> >>>>>>> And running the clustering on the cooccurrence matrix or doing PCA > by >>>>>>> removing eigenvalues/vectors? >>>>>>> >>>>>>> On May 6, 2013, at 8:52 PM, Ted Dunning <ted.dunn...@gmail.com> >>> wrote: >>>>>>> >>>>>>>> On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner < >>> cont...@dhuebner.com >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Oh, and I forgot how the views and sales are used to build product >>>>>>>>> vectors. As of now, I implemented binary vectors, vectors counting >>> the >>>>>>>>> number of views and sales (e.g 1view=1count, 1sale=10counts) and >>>>>>> ordinary >>>>>>>>> vectors ( view => 1, sale=>5). >>>>>>>>> >>>>>>>> >>>>>>>> I would recommend just putting the view and sale in different > columns >>>>> and >>>>>>>> doing cooccurrence analysis on this. >>>>>>> >>>>>>> >>>>> >>>>> >>> >>> > >