Thanks.. What if its a big matrix.. like billions rows million columns On Wednesday, July 30, 2014, Davies Liu <dav...@databricks.com> wrote:
> It will depends on the size of your matrix. If it can fit in memory, > then you can > > sparse = sparse_matrix(matrix) # sparse_matrix is the function you had > written > sc.parallelize(sparse, NUM_OF_PARTITIONS) > > On Tue, Jul 29, 2014 at 11:39 PM, Chengi Liu <chengi.liu...@gmail.com > <javascript:;>> wrote: > > Hi, > > I have an rdd with n rows and m columns... but most of them are 0 .. > its > > as sparse matrix.. > > > > I would like to only get the non zero entries with their index? > > > > Any equivalent python code would be > > > > for i,x in enumerate(matrix): > > for j,y in enumerate(x): > > if y: > > print i,j,y > > > > Now, what I would like to do is save i,j,y entries? > > How do I do this in pyspark. > > Thanks > > > > >