Thanks..
What if its a big matrix.. like billions rows million columns
On Wednesday, July 30, 2014, Davies Liu <dav...@databricks.com> wrote:

> It will depends on the size of your matrix. If it can fit in memory,
> then you can
>
> sparse = sparse_matrix(matrix) # sparse_matrix is the function you had
> written
> sc.parallelize(sparse, NUM_OF_PARTITIONS)
>
> On Tue, Jul 29, 2014 at 11:39 PM, Chengi Liu <chengi.liu...@gmail.com
> <javascript:;>> wrote:
> > Hi,
> >     I have an rdd with n rows and m columns... but most of them are 0 ..
> its
> > as sparse matrix..
> >
> > I would like to only get the non zero entries with their index?
> >
> > Any equivalent python code would be
> >
> > for i,x in enumerate(matrix):
> >    for j,y in enumerate(x):
> >         if y:
> >            print i,j,y
> >
> > Now, what I would like to do is save i,j,y entries?
> > How do I do this in pyspark.
> > Thanks
> >
> >
>

Reply via email to