In implicit feedback model, the coefficients were already penalized
(towards zero) by the number of unobserved ratings. So I think it is
fair to keep the 1.3.0 weighting (by the number of total users/items).
Again, I don't think we have a clear answer. It would be nice to run
some experiments and s
After thinking about it more, I do think weighting lambda by sum_i cij is
the equivalent of the ALS-WR paper's approach for the implicit case. This
provides scale-invariance for varying products/users and for varying ratings,
and should behave well for all alphas. What do you guys think?
On Wed, M
Whoops I just saw this thread, it got caught in my spam filter. Thanks for
looking into this Xiangrui and Sean.
The implicit situation does seem fairly complicated to me. The cost
function (not including the regularization term) is affected both by the
number of ratings and by the number of user/p
Ravi, we just merged https://issues.apache.org/jira/browse/SPARK-6642
and used the same lambda scaling as in 1.2. The change will be
included in Spark 1.3.1, which will be released soon. Thanks for
reporting this issue! -Xiangrui
On Tue, Mar 31, 2015 at 8:53 PM, Xiangrui Meng wrote:
> I created a
I created a JIRA for this:
https://issues.apache.org/jira/browse/SPARK-6637. Since we don't have
a clear answer about how the scaling should be handled. Maybe the best
solution for now is to switch back to the 1.2 scaling. -Xiangrui
On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen wrote:
> Ah yeah I ta
Ah yeah I take your point. The squared error term is over the whole
user-item matrix, technically, in the implicit case. I suppose I am
used to assuming that the 0 terms in this matrix are weighted so much
less (because alpha is usually large-ish) that they're almost not
there, but they are. So I h
Hey Sean,
That is true for explicit model, but not for implicit. The ALS-WR
paper doesn't cover the implicit model. In implicit formulation, a
sub-problem (for v_j) is:
min_{v_j} \sum_i c_ij (p_ij - u_i^T v_j)^2 + lambda * X * \|v_j\|_2^2
This is a sum for all i but not just the users who rate i
I had always understood the formulation to be the first option you
describe. Lambda is scaled by the number of items the user has rated /
interacted with. I think the goal is to avoid fitting the tastes of
prolific users disproportionately just because they have many ratings
to fit. This is what's
Okay, I didn't realize that I changed the behavior of lambda in 1.3.
to make it "scale-invariant", but it is worth discussing whether this
is a good change. In 1.2, we multiply lambda by the number ratings in
each sub-problem. This makes it "scale-invariant" for explicit
feedback. However, in impli
This sounds like a bug ... Did you try a different lambda? It would be
great if you can share your dataset or re-produce this issue on the
public dataset. Thanks! -Xiangrui
On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody wrote:
> After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
After upgrading to 1.3.0, ALS.trainImplicit() has been returning vastly
smaller factors (and hence scores). For example, the first few product's
factor values in 1.2.0 are (0.04821, -0.00674, -0.0325). In 1.3.0, the
first few factor values are (2.535456E-8, 1.690301E-8, 6.99245E-8). This
differenc
11 matches
Mail list logo