Re: QR decomposition in Spark ALS

2014-03-06 Thread Sean Owen
Yes in this case you end up with the least-squares solution. I don't see a problem with that; it's a corner case anyway and the best you can do. The QR decomposition will handle it either way, finding the exact solution when it exists. I think it's slower than Cholesky? (yes I am guessing that is

Re: QR decomposition in Spark ALS

2014-03-06 Thread Matei Zaharia
Yup, this would definitely be fine. I’d like to understand when this happens though, I imagine it might be if a user / product has no ratings (though we should certainly try to run well in that case). Matei On Mar 6, 2014, at 10:00 AM, Debasish Das wrote: > Matei, > > If the data has linearl

Re: QR decomposition in Spark ALS

2014-03-06 Thread Matei Zaharia
But Sean, because that matrix is not invertible, you can’t solve it. That’s why I’m saying, as long as it is solvable, it will be positive definite too, and in that case solvePositive is optimized for this use case (I believe it does Cholesky decomposition). Matei On Mar 6, 2014, at 9:58 AM,

Re: QR decomposition in Spark ALS

2014-03-06 Thread Debasish Das
Matei, If the data has linearly dependent rows ALS should have a failback mechanism. Either remove the rows and then call BLAS posv or call BLAS gesv or Breeze QR decomposition. I can share the analysis over email. Thanks. Deb On Thu, Mar 6, 2014 at 9:39 AM, Matei Zaharia wrote: > Xt*X should

Re: QR decomposition in Spark ALS

2014-03-06 Thread Sean Owen
Agree. For example you could have a user-feature matrix X = 1 0 1 0 and X' * X = 2 0 0 0 is not positive definite but is semidefinite. So I think the code should be calling solveSymmetric not solvePositive? the latter requires positive definite. -- Sean Owen | Director, Data Science | London

Re: QR decomposition in Spark ALS

2014-03-06 Thread Matei Zaharia
Xt*X should mathematically always be positive semi-definite, so the only way this might be bad is if it’s not invertible due to linearly dependent rows. This might happen due to the initialization or possibly due to numerical issues, though it seems unlikely. Maybe it also happens if some users

Re: QR decomposition in Spark ALS

2014-03-06 Thread Sean Owen
I do not think there is an advantage in projecting into only the nonnegative quadrant (meaning all of X and Y are nonnegative right?) The argument I have seen is simply interpretability but this doesn't matter here. I think it would be a great exercise to see if the QR decomposition is as fast, an

Re: QR decomposition in Spark ALS

2014-03-06 Thread Debasish Das
Bound constraints in QR decomposition / BLAS posv other than projecting to positive space at each iteration ? Common usecases are feature generation from photos/videos etc... I saw a paper on projecting to positive space from 70s...there are some improvements later using projected gradients but t

Re: QR decomposition in Spark ALS

2014-03-06 Thread Debasish Das
Yes that will be really cool if the data has linearly independent rows ! I have to debug it more but I got it running with jblas Solve.solve.. I will try breeze QR decomposition next. Have you guys tried adding bound constraints in QR decomposition / BLAS posv other than projecting to positive sp

Re: QR decomposition in Spark ALS

2014-03-06 Thread Sean Owen
Hmm, Will Xt*X be positive definite in all cases? For example it's not if X has linearly independent rows? (I'm not going to guarantee 100% that I haven't missed something there.) Even though your data is huge, if it was generated by some synthetic process, maybe it is very low rank? QR decomposi

QR decomposition in Spark ALS

2014-03-06 Thread Debasish Das
Hi Sebastian, Yes Mahout ALS and Oryx runs fine on the same matrix because Sean calls QR decomposition. But the ALS objective should give us strictly positive definite matrix..I am thinking more on it.. There are some random factor assignment step but that also initializes factors with normal(0,