let me check in the morning......

btw  Dmitriy   we are now trying to use the new spark version of  ssvd
(from git),   i see that u are still the author,  so i'll be coming here
again with more questions :)

we are also exploring using pLSA  directly instead of  matrix
factorization,  that could possibly be faster.  again some new
implementations  are  available  on JIRAs
On Nov 3, 2014 2:20 PM, "Dmitriy Lyubimov" <[email protected]> wrote:

> Ok. so that's what i suspected.
>
> The method generally is not intended to run on inputs with ranks smaller
> than k+p parameters. MR version doesn't even check for it.
>
> However as i mentioned in manual, i did run tests with -q=0 in which case
> correspondent singular vectors on the right should be reset to 0.0, not
> NaNs . It is possible that with -q=1 power iterations do something
> inadmissible in that situation.
>
> just for the record, what -q setting have you used?
>
> On Mon, Nov 3, 2014 at 2:00 PM, Yang <[email protected]> wrote:
>
> > it does have something to do with K. previously I used a formular to
> > determine my rank to use by
> >
> > rank = N - p - 1 = 64 - 5 -1   = 58 , where N is the number of columns of
> > the original matrix.
> >
> > then I tried using rank = 50, it worked.
> >
> > well.... as I write this email, I realized that the reason might be that
> > the actual rank R of the original matrix may be much smaller than N, that
> > could be the reason. but it is a bit difficult to figure out that R
> > beforehand.
> >
> >
> > thanks
> > Yang
> >
> > On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > is the matrix by any chance constructed so that it may have rank < k? I
> > > think MR code is not checking for that.
> > >
> > > In spark shell i have :
> > >
> > > mahout> val a = dense( (0,0),(0,0) )
> > > a: org.apache.mahout.math.DenseMatrix =
> > > {
> > >   0  => {}
> > >   1  => {}
> > > }
> > > mahout> svd(a)
> > > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> > > org.apache.mahout.math.DenseVector) =
> > > ({
> > >   0  => {0:1.0}
> > >   1  => {1:1.0}
> > > },{
> > >   0  => {0:-1.0}
> > >   1  => {1:-1.0}
> > > },{})
> > >
> > > But :
> > >
> > > mahout> ssvd(a,2,0)
> > >
> > > java.lang.AssertionError: assertion failed: Rank-deficiency detected
> > during
> > > s-SVD
> > >
> > > or
> > > mahout> val drmA = drmParallelize(a,2)
> > > mahout> dssvd(drmA, k=2)
> > > java.lang.IllegalArgumentException: R is rank-deficient.
> > >
> > >
> > > the MR version doesn't check for these effects and it may create some
> > > degenerate results, although i thought those should be 0s, at least
> when
> > > -q=0. I am not sure for -q=1,2...
> > >
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 10:35 PM, Yang <[email protected]> wrote:
> > >
> > > > i am talking about the MR one.
> > > >
> > > > thanks
> > > > yang
> > > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <[email protected]>
> wrote:
> > > >
> > > > > This is not a known problem...
> > > > >
> > > > > there are few ssvd here, sequential, MR and spark one. for the
> > record,
> > > > > which one are you running?
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <[email protected]>
> wrote:
> > > > >
> > > > > > we are running ssvd on a dataset (this one is relatively small,
> > with
> > > > 8000
> > > > > > rows, number of columns is 64 ),  we ran it with rank = 58, since
> > > > > sampling
> > > > > > p=5.
> > > > > >
> > > > > > the result had NaN on multiple columns.
> > > > > >
> > > > > > why would this appear ?
> > > > > >
> > > > > > I am now running with lower rank=20 , to see if it goes away.
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Yang
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to