On Mon, Mar 25, 2013 at 11:25 AM, Sebastian Schelter <[email protected]> wrote: > Well in LSI it is ok to do that, as a missing entry means that the > document contains zero occurrences of a given term which is totally fine. > > In Collaborative Filtering with explicit feedback, a missing rating is > not automatically a rating of zero, it is simply unknown what the user > would give as rating. > > fOR implicit data (number of interactions), a missing entry is indeed > zero, but in most cases you might not have the same confidence in that > observation as if you observed an interaction. Koren's ALS paper > discusses this and introduces constructs to handle this, by putting more > weight on minimizing the loss over observed interactions. > > In matrix factorization for CF, the factorization usually has to > minimize the regularized loss over the known entries only. If all > unknown entries were simply considered zero, I'd assume that the > factorization that resulted would not generalize very well to unseen > data. Some researchers title matrix factorization for CF as matrix > completion which IMHO better describes the problem.
Yes it's just that you "shouldn't" if inputs are rating-like, not that you literally couldn't. If your input is ratings on a scale of 1-5 then reconstructing a 0 everywhere else means you assume everything not viewed is hated, which doesn't work at all. You can subtract the mean from observed ratings, and then you assume everything unobserved has an average rating. But the assumption works nicely for click-like data. Better still when you can "weakly" prefer to reconstruct the 0 for missing observations and much more strongly prefer to reconstruct the "1" for observed data.
