Hello, Dirk,

maybe I'm missing something, but to avoid your for-loop-approach doesn't

M <- M/Matrix::rowSums(M)

do what you want?

 Hth  --  Gerrit

---------------------------------------------------------------------
Dr. Gerrit Eichner                   Mathematical Institute, Room 212
gerrit.eich...@math.uni-giessen.de   Justus-Liebig-University Giessen
Tel: +49-(0)641-99-32104          Arndtstr. 2, 35392 Giessen, Germany
Fax: +49-(0)641-99-32109            http://www.uni-giessen.de/eichner
---------------------------------------------------------------------

Hello R-Users,

I'm looking for a way to scale the rows of a sparse matrix M with about
57,000 rows, 14,000 columns, and 238,000 non-zero matrix elements; see
example code below.

Usually I'd use the base::scale() function (see sample code), but it
freezes my computer. The same happens when I try to run a for loop over
the matrix rows.

The conversion with as.matrix() yields a 5.8 Gb large object, which
appears too large for scale().


So my question is: How can the rows of a large sparse matrix be
efficiently scaled?

Thanks and regards,

Dirk


### Hardware/Session Info
Intel Core i7 w/ 12 Gb RAM
R version 3.2.1 (2015-06-18)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

### Example Code
library(Matrix)
set.seed(42)

## These are exemplary values for my real "problem matrix"
N_ROW <- 56743
N_COL <- 13648
SIZE  <- 238283
PROB <- c(0.050, 0.050, 0.099, 0.149, 0.198, 0.178, 0.119,
         0.079, 0.0297, 0.0198, 0.001, 0.001, 0.001)

## get some random values to populate the sparse matrix
x <- do.call(
 what = rbind,
 args = lapply(X = 1:N_ROW,
               FUN = function(i)
                 expand.grid(i,
                   sample(x = 1:N_COL,
                     size = sample(1:15, 1),
                     replace = TRUE)
                 )
        )
)
x[,3] <- sample(x = 1:13, size = nrow(x),
          replace = TRUE, prob = PROB)

## build the sparse matrix
M <- Matrix::sparseMatrix(
      dims = c(N_ROW, N_COL),
      i = x[,1],
      j = x[,2],
      x = x[,3]
)
print(format(object.size(M), units = "auto"))

## *******************************************
## Scaling the rows of M

## scale() lets my computer freeze
# M <- scale(t(M), center = FALSE, scale(Matrix::rowSums(M)))

## this appears to be not elegant at all and takes forever
# rwsms <- Matrix::rowSums(M)
# for (i in 1:nrow(M)) M[i,] <- M[i,]/rwsms[[i]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to