Careful, dot products are sometimes called “cosine” is false. Cosine = (x.dot(y)) /(norm(x)*norm(y)). That is not x.dot(y) unless the norms sum to 1.
On Sun, Feb 5, 2017 at 10:36 AM, Pat Ferrel <[email protected]> wrote: > Nice, someone does read the math :-) > > Content: The type of personalized “content” indicators talked about in the > slides are not supported by the Universal Recommender and have little value > unless you have no collaborative filtering data. They can theoretically be > mixed with other indicators but you have to have history of the content a > user has preferred in some way and that can also be seen as CF data so that > part of the theory has value in only very specific edge cases like > personalized news, where stories mostly do not get enough events to use for > CF. If this is your case we can talk more. Most people have CF data and so > content cannot be used in this way but can as “intrinic”. > > Intrinsic: These are things like categories, tags, subjects, even derived > indicators like LDA Topics, or popularity. They are attached to items as > metadata. These are supported by the UR in several ways including boosts > and filters. Imagine an ecom use case where a user is looking at a piece of > “clothing”, at the bottom of the page you show “people who bought this also > bought these” but you want only clothing, not the occasional video of > electronics item. The things at the bottom of the page are “item-based” > recommendations, not personalized but could also be personalized—no matter. > The point is that of all recommendations you want to show only items that > have the “category”: [“clothing”]. So it you have attached this “intrinsic” > indicator to items you can query for item or user based recs with category: > clothing. You can filter all recommendations out that do not have the > category or you can boost items that have the category, both are done by > changing the “bias” value in the query. See this page: > http://actionml.com/docs/ur_queries <http://actionml.com/docs/ur_queries> > > Collaborative Filtering based indicators. Are based on any action, bit of > context, or profile info that you think may relate to the user’s taste or > preferences. These are more correctly called indicators when they are > gathered but they go through a correlation test, that checks if the > individual events appear to correlate with the conversion/primary event. So > after the test we call them correlators and they are attached to items. So > CF correlators of several types may be attached to each item along with the > Intrinsic correlators. > > The Universal Recommender creates a model of all items with all CF and > Intrinsic Correlators attached in a Lucene Index to all items with > correlators. The index allows very fast scalable KNN queries (using cosine > similarity). So when you ask the UR for user-based recommendations for > user-1 we look up the recent events of user-1 and use these to make a KNN > query to Lucene (inside of Elasticsearch) for items that have similar > correlators. If you ask for user-based recommendations but bias or boost > clothing by 10, the UR will internally multiply the hit score for > “clothing” by 10 and re-rank all results. This means that “clothing” will > be favored in results but if there are no recs for clothing, other types of > recs may be returned. > > Scores: These are literally the sum of “dot products” of all indictors > with boosts accounted for. Dot products are sometimes called “cosine” since > the cosine of the angle between two vectors is the dot product of the > normalized vectors. Each indicator is a vector, if you refer back to the > slides and the total score is the sum of one vector times the entire > matrix. If you then sum the dot products it is the score for all items. > Lucene actually does this but makes use of special indexing and the > sparseness of the data and query. So the result from Lucene is the items > that are K Nearest Neighbors to the indicator vectors in the query. > Conceptually Lucene does this for all items in the index but it skips 99% > of them and distributes queries to produce the answer very quickly. The > math in the slides shows what you would get if you did the matrix math for > all data and if you paginated and returned all recommendations you would > get exactly the results in the slides, but all you care about are the top > k—therefor KNN > > TLDR; After the model is created with Mahout the last phase of the matrix > math, finding the most similar items done inside Elasticsearch so one query > returns the top ranked results. The scores can be explained (by the math > you read) but are of no real use, only the rank matters. > > BTW the CCO algorithm in partly implemented in Mahout with the last phase > in Elasticsearch, and you can get community support for the Universal > Recommender here: https://groups.google.com/forum/#!forum/actionml-user < > https://groups.google.com/forum/#!forum/actionml-user> > > > On Feb 5, 2017, at 12:42 AM, Peng Zhang <[email protected]> wrote: > > Hi, > > Suppose we have created three types of indicators (coocurrence, content and > intrinsic) and indexed them into Ellastic Search (ES). Then we query on > these three types of indicators of a user to get recommended items. How > does Universal Tecommender rank the items recommended based on these three > types of indicators? > > I have gone thru the slides on Universal Recommender created by Pat. It's > very informative. Here is the link: > https://www.slideshare.net/mobile/pferrel/unified-recommender-39986309 > > Thanks > -Peng > >
