let me check in the morning...... btw Dmitriy we are now trying to use the new spark version of ssvd (from git), i see that u are still the author, so i'll be coming here again with more questions :)
we are also exploring using pLSA directly instead of matrix factorization, that could possibly be faster. again some new implementations are available on JIRAs On Nov 3, 2014 2:20 PM, "Dmitriy Lyubimov" <[email protected]> wrote: > Ok. so that's what i suspected. > > The method generally is not intended to run on inputs with ranks smaller > than k+p parameters. MR version doesn't even check for it. > > However as i mentioned in manual, i did run tests with -q=0 in which case > correspondent singular vectors on the right should be reset to 0.0, not > NaNs . It is possible that with -q=1 power iterations do something > inadmissible in that situation. > > just for the record, what -q setting have you used? > > On Mon, Nov 3, 2014 at 2:00 PM, Yang <[email protected]> wrote: > > > it does have something to do with K. previously I used a formular to > > determine my rank to use by > > > > rank = N - p - 1 = 64 - 5 -1 = 58 , where N is the number of columns of > > the original matrix. > > > > then I tried using rank = 50, it worked. > > > > well.... as I write this email, I realized that the reason might be that > > the actual rank R of the original matrix may be much smaller than N, that > > could be the reason. but it is a bit difficult to figure out that R > > beforehand. > > > > > > thanks > > Yang > > > > On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > is the matrix by any chance constructed so that it may have rank < k? I > > > think MR code is not checking for that. > > > > > > In spark shell i have : > > > > > > mahout> val a = dense( (0,0),(0,0) ) > > > a: org.apache.mahout.math.DenseMatrix = > > > { > > > 0 => {} > > > 1 => {} > > > } > > > mahout> svd(a) > > > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix, > > > org.apache.mahout.math.DenseVector) = > > > ({ > > > 0 => {0:1.0} > > > 1 => {1:1.0} > > > },{ > > > 0 => {0:-1.0} > > > 1 => {1:-1.0} > > > },{}) > > > > > > But : > > > > > > mahout> ssvd(a,2,0) > > > > > > java.lang.AssertionError: assertion failed: Rank-deficiency detected > > during > > > s-SVD > > > > > > or > > > mahout> val drmA = drmParallelize(a,2) > > > mahout> dssvd(drmA, k=2) > > > java.lang.IllegalArgumentException: R is rank-deficient. > > > > > > > > > the MR version doesn't check for these effects and it may create some > > > degenerate results, although i thought those should be 0s, at least > when > > > -q=0. I am not sure for -q=1,2... > > > > > > > > > > > > > > > On Thu, Oct 30, 2014 at 10:35 PM, Yang <[email protected]> wrote: > > > > > > > i am talking about the MR one. > > > > > > > > thanks > > > > yang > > > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <[email protected]> > wrote: > > > > > > > > > This is not a known problem... > > > > > > > > > > there are few ssvd here, sequential, MR and spark one. for the > > record, > > > > > which one are you running? > > > > > > > > > > > > > > > > > > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <[email protected]> > wrote: > > > > > > > > > > > we are running ssvd on a dataset (this one is relatively small, > > with > > > > 8000 > > > > > > rows, number of columns is 64 ), we ran it with rank = 58, since > > > > > sampling > > > > > > p=5. > > > > > > > > > > > > the result had NaN on multiple columns. > > > > > > > > > > > > why would this appear ? > > > > > > > > > > > > I am now running with lower rank=20 , to see if it goes away. > > > > > > > > > > > > > > > > > > Thanks > > > > > > Yang > > > > > > > > > > > > > > > > > > > > >
