Re: Confidence in implicit factorization

2015-07-26 Thread Sean Owen
You can tune alpha like any other hyperparam, and measuring whatever metric makes most sense -- AUC, etc. I don't think there's a general guidelines that's more specific than that. I also have not applied this to document retrieval / recommendation before I don't think you need to modify counts or

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
In your experience with using implicit factorization for document clustering, how did you tune alpha ? Using perplexity measures or just something simple like 1 + rating since the ratings are always positive in this case On Sun, Jul 26, 2015 at 1:23 AM, Sean Owen wrote: > It sounds like you'

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
We got good clustering results from Implicit factorization using alpha = 1.0 since I thought to have a confidence of 1 + rating to observed entries and 1 to unobserved entries. I used positivity / sparse coding basically to force sparsity on document / topic matrix...But then I got confused because

Re: Confidence in implicit factorization

2015-07-26 Thread Sean Owen
It sounds like you're describing the explicit case, or any matrix decomposition. Are you sure that's best for count-like data? "It depends," but my experience is that the implicit formulation is better. In a way, the difference between 10,000 and 1,000 count is less significant than the difference

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
I will think further but in the current implicit formulation with confidence, looks like I am factorizing a 0/1 matrix with weights 1 + alpha*rating for observed (1) values and 1 for unobserved (0) values. It's a bit different from LSA model. >> On Sun, Jul 26, 2015 at 6:45 AM, Debasish Das >> w

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
Yeah, I think the idea of confidence is a bit different than what I am looking for using implicit factorization to do document clustering. I basically need (r_ij - w_ih_j)^2 for all observed ratings and (0 - w_ih_j)^2 for all the unobserved ratings...Think about the document x word matrix where r_

Re: Confidence in implicit factorization

2015-07-26 Thread Sean Owen
confidence = 1 + alpha * |rating| here (so, c1 means confidence - 1), so alpha = 1 doesn't specially mean high confidence. The loss function is computed over the whole input matrix, including all missing "0" entries. These have a minimal confidence of 1 according to this formula. alpha controls how

Confidence in implicit factorization

2015-07-25 Thread Debasish Das
Hi, Implicit factorization is important for us since it drives recommendation when modeling user click/no-click and also topic modeling to handle 0 counts in document x word matrices through NMF and Sparse Coding. I am a bit confused on this code: val c1 = alpha * math.abs(rating) if (rating > 0