Re: Clustering product views and sales

Pat Ferrel Tue, 07 May 2013 09:22:05 -0700

You always will have a "cold start" problem for a subset of users--the new ones 
to a site. Popularity doesn't always work either. Sometimes you have a flat 
purchase frequency distribution, as I've seen. In these cases a metadata or 
content based recommender is nice to fill in. If you have no metadata you still 
have items similarities (based on older users purchases and views).


I think one important thing to think about is that you don't always need to 
have recommendations based on the user's history. You may find that you get 
better results by using item similarity based recommendations. So on an item 
page you can show recommendations with the above techniques in a wide variety 
of situations.

On another subject looking at the predictive power of views (for purchases) and 
purchases (for purchases) you will likely find views a weak predictor. I think 
what Ted is talking about below is a technique for using a co-occurrence matrix 
to find views that lead to purchases. To use this you would build two models, 
one from purchases and one from the co-ocurrence of views with purchases. Then 
you will need to combine the weights of recommendations from both models for a 
given user history OR similarities for a given item.

The conversation Johannes sites below has some details 
http://markmail.org/message/5cfewal3oyt6vw2k

I have a working cross-recommender made for using views and purchases. The next 
question is how how to measure its performance. There are ways to simulate the 
view-purchase data and other uses for the cross-recommender technique. But 
having a real view and purchase dataset would be incredibly useful! I keep 
begging people on this list...

Can you share your data? If so I'd be happy to share the code (actually I'll 
put it on github eventually).


On May 6, 2013, at 9:40 PM, Johannes Schulte <johannes.schu...@gmail.com> wrote:

Hi!
As a starting point I remember this conversation containing both elements
(although the reconstruction part is rather small, hint!)

http://markmail.org/message/5cfewal3oyt6vw2k


On Tue, May 7, 2013 at 1:00 AM, Dominik Hübner <cont...@dhuebner.com> wrote:

> One more thing for now @Ted:
> What do you refer to with sparsification and reconstruction?
> 
> On May 7, 2013, at 12:19 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> 
>> Truly cold start is best handled by recommending the most popular items.
>> 
>> If you know *anything* at all such as geo or browser or OS, then you can
>> use that to recommend using conventional techniques (that is, you can
>> recommend for the characteristics rather than for the person).
>> 
>> Within a very few interactions, however, real recommendations will kick
> in.
>> 
>> My lately preferred approach is to derive indicators using sparsification
>> or ALS+reconstruction.  These indicators can be historical items or
> static
>> items such as geo information.  These indicators can be combined in a
>> single step using a search engine.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Mon, May 6, 2013 at 2:58 PM, Dominik Hübner <cont...@dhuebner.com>
> wrote:
>> 
>>> The cluster was mostly intended for tackling the cold start problem for
>>> new users.
>>> I want to build a recommender based on existing components or to be
>>> precise a combination of them.
>>> 
>>> Unfortunately, the only product meta-data I currently have is the
> product
>>> price. Furthermore, this is a project
>>> I am working on alone. As a consequence, the approaches I can examine in
>>> the given time are limited.
>>> 
>>> Would using ALS and ranking its outcome by e.g. frequent item set
>>> algorithms be something worth looking into?
>>> Or did you mean something different?
>>> 
>>> My personal goal is to build a recommender providing acceptable results
>>> using the data I currently have available.
>>> Of course, this will only serve as a basis for further improvements
> where
>>> necessary or if further information can be obtained.
>>> 
>>> 
>>> On May 6, 2013, at 11:21 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>>> 
>>>> Are you looking to build a product recommender based on your own
> design?
>>>> Or do you want to build one based on existing methods?
>>>> 
>>>> If you want to use existing methods, clustering has essentially no
> role.
>>>> 
>>>> I think that composite approaches that use item meta-data and different
>>>> kinds of behavioral cues are important to best performance.
>>>> 
>>>> 
>>>> On Mon, May 6, 2013 at 12:35 PM, Dominik Hübner <cont...@dhuebner.com
>>>> wrote:
>>>> 
>>>>> Well, as you already might have guessed, I am building a product
>>>>> recommender system for my thesis.
>>>>> 
>>>>> I am planning to evaluate ALS (both, implicit and explicit) as well as
>>>>> item -similarity recommendation for users with at least a few known
>>>>> products. Nevertheless, the majority of users only has seen a single
> (or
>>>>> 2-3) product(s). I want to recommend them the most popular items from
>>>>> clusters, their only product comes from (as a workaround for the
>>> cold-start
>>>>> problem). Furthermore, I expect to be able to see which "kind" of
>>> products
>>>>> users like. This might provide me some information about how well ALS
>>> and
>>>>> similarity recommenders fit the user's area of interest (an early
>>>>> evaluation) or at least to estimate if the chosen approach will work
> in
>>>>> some way.
>>>>> 
>>>>> On May 6, 2013, at 9:09 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>>>>> 
>>>>>> I don't even think that clustering is all that necessary.
>>>>>> 
>>>>>> The reduced cooccurrence matrix will give you items related to each
>>> item.
>>>>>> 
>>>>>> You can use something like PCA, but SVD is just as good here due to
>>> near
>>>>>> zero mean.  You could SSVD or ALS from Mahout to do this analysis and
>>>>> then
>>>>>> use k-means on the right singular vectors (aka item representation).
>>>>>> 
>>>>>> What is the high level goal that you are trying to solve with this
>>>>>> clustering?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, May 6, 2013 at 12:01 PM, Dominik Hübner <
> cont...@dhuebner.com
>>>>>> wrote:
>>>>>> 
>>>>>>> And running the clustering on the cooccurrence matrix or doing PCA
> by
>>>>>>> removing eigenvalues/vectors?
>>>>>>> 
>>>>>>> On May 6, 2013, at 8:52 PM, Ted Dunning <ted.dunn...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>>> On Mon, May 6, 2013 at 11:29 AM, Dominik Hübner <
>>> cont...@dhuebner.com
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Oh, and I forgot how the views and sales are used to build product
>>>>>>>>> vectors. As of now, I implemented binary vectors, vectors counting
>>> the
>>>>>>>>> number of views and sales (e.g 1view=1count, 1sale=10counts) and
>>>>>>> ordinary
>>>>>>>>> vectors ( view => 1, sale=>5).
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> I would recommend just putting the view and sale in different
> columns
>>>>> and
>>>>>>>> doing cooccurrence analysis on this.
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
> 
>

Re: Clustering product views and sales

Reply via email to