You managed to ask the question in a convoluted way. I am not sure if you don't understand the principles or the intricacies. The high level answer is the following. ALS is just like SVD, only it is not SVD. It produces a low rank approximation of the user-item matrix. I.e., represents the user-item matrix as a product of two smaller matrices, item-feature times feature-user. It does that by initializing them to some random junk, and then repeatedly solving the least square problem. Sets one matrix, solves for the other, sets the other solves for the first. It converges rapidly. For Movie Lens data sets, it's like three iterations and you're done. Six to be on the safe size. Ten is an overkill. I don't see why you would expect it to be done in one iteration, though. You are starting with random matrices. You have to iterate. The way I see it, regularization is just a good practice. It's there to prevent over-fitting, which I don't think is an issue. unless you are using a very high number of features, basically approaching the number of users or items. The textbook application of regularization with lambda=1 and the problem is gone. You can have more features than users or vectors, and your user-item matrix will still be reconstructed correctly. You can also keep iterating as much as you want. The problem is gone. At least for the matrices I tried - Movie Lens (implicit feedback - no ratings). Not sure my answer is helpful, but just giving it a shot. I am sure other will chip in. Koobas
On Sun, Mar 24, 2013 at 10:19 PM, Dominik Huebner <[email protected]>wrote: > It's quite hard for me to get the mathematical concepts of the ALS > recommenders. It would be great if someone could help me to figure out > the details. This is my current status: > > 1. The item-feature (M) matrix is initialized using the average ratings > and random values (explicit case) > > 2. The user-feature (U) matrix is solved using the partial derivative of > the error function with respect to u_i (the columns of row-vectors of U) > > Supposed we use as many features as items are known and the error > function does not use any regularization. Would U be solved within the > first iteration? If not, I do not understand why more than one iteration > is needed. > Furthermore, I believe to have understood that using fewer features than > items and also applying regularization, does not allow to solve U in a > way that the stopping criterion can be met after only one iteration. > Thus, iteration is required to gradually converge to the stopping > criterion. > > I hope I have pointed out my problems clearly enough. > >
