Re: calculating the mean of SparseVector RDD

2015-01-12 Thread Xiangrui Meng
egate_combined_vectors(vec1, vec2) : >> >> > if all(vec1 == vec2) : >> >> > # then the vector came from only one partition >> >> > return vec1 >> >> > else: >> >> > return vec1 + vec2

Re: calculating the mean of SparseVector RDD

2015-01-12 Thread Rok Roskar
ed_vectors) > >> > means = means / nvals > >> > > >> > This turns out to be really slow -- and doesn't seem to depend on how > >> > many > >> > vectors there are so there seems to be some overhead somewhere that > I'm > >> &g

Re: calculating the mean of SparseVector RDD

2015-01-09 Thread Xiangrui Meng
ctors) >> > means = means / nvals >> > >> > This turns out to be really slow -- and doesn't seem to depend on how >> > many >> > vectors there are so there seems to be some overhead somewhere that

Re: calculating the mean of SparseVector RDD

2015-01-09 Thread Rok Roskar
seem to depend on how > many > > vectors there are so there seems to be some overhead somewhere that I'm > not > > understanding. Is there a better way of doing this? > > > > > > > > -- >

Re: calculating the mean of SparseVector RDD

2015-01-07 Thread Xiangrui Meng
-- and doesn't seem to depend on how many > vectors there are so there seems to be some overhead somewhere that I'm not > understanding. Is there a better way of doing this? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.

calculating the mean of SparseVector RDD

2015-01-07 Thread rok
out to be really slow -- and doesn't seem to depend on how many vectors there are so there seems to be some overhead somewhere that I'm not understanding. Is there a better way of doing this? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cal