I'm running into an unexpected error using the glmnet and Matrix packages.
I have a matrix that is 8 million rows by 100 columns with 75% of the
entries being zero. When I run a vanilla glmnet logistic model on my server
with 300 GB of RAM, the task completes in 20 minutes:
> x # 8 million x 100 matrix
> model1 <- glmnet(x,y,'binomial',alpha=1) # run time 20 minutes
But if I convert the matrix to a sparse matrix using the Matrix package,
the model does not run at all:
> x2 <- Matrix(x,sparse=T) # 75% sparse
> model2 <- glmnet(x2,y,'binomial',alpha=1) # error
Error in array(0, c(n, p)) : 'dim' specifies too large an array
This result is the opposite of what I might have expected. The non-sparse
data runs fine, but the sparse data fails because it is "too large". Is
this a glmnet issue or an R memory issue? Is there a way to fix this in
glmnet?
--Nathan
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.