I've been trying to get some linear classifiers (LiblineaR, kernlab, e1071) to work with a sparse matrix of feature data. In the case of LiblineaR and kernlab, it seems I have to coerce my data into a dense matrix in order to train a model. I've done a number of searches, read through the manuals and vignettes, but I can't seem to see how to use either of these packages with sparse matrices. I've tried using both csr from SparseM and sparseMatrix from the Matrix library. You can see a simple example recreating my results below.
Does anybody know if there's a trick to get this to work without coercing the data into a dense matrix? I'm currently playing with the KDDCUP 2010 datasets. I've written a simple script to create hash kernel feature vectors for each of the rows of training data. Right now I haven't added many features into the hash vectors. For simplicity, I'm just creating a string token for each feature, then hashing it and taking that hash mod 10007 and 10009 (so two buckets for each feature with a low likelihood of two features colliding on both buckets). 10009 columns may seem like overkill, but I figured if it was a sparse matrix the number of columns really wouldn't matter that much. Right now I'm also only playing with 99999 rows of input. When ever I make the mistake of doing something which unintentionally coerces the sparse matrix into a dense one, I end up eating up all my RAM, going to swap, and spending the next 5 minutes trying to kill my session... So I'm looking for something that scales relatively well without taking up too large a memory footprint to run. Thanks! Jeff See below for an example that recreates what my basic attempts at using sparse matrices > L1=rep(0:1,5) > M1=sparseMatrix(i=c(1:5*2,1:5*2),j=c(rep(1,5),rep(10,5)),x=1) > L1=rep(0:1,5) > SM1=sparseMatrix(i=c(1:5*2,1:5*2),j=c(rep(1,5),rep(10,5)),x=1) > DM=as.matrix(SM1) > SM2=as.matrix.csr(DM) > as.matrix(SM2) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 1 0 0 0 0 0 0 0 0 1 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 1 0 0 0 0 0 0 0 0 1 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 1 0 0 0 0 0 0 0 0 1 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 1 0 0 0 0 0 0 0 0 1 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 1 0 0 0 0 0 0 0 0 1 > L1 [1] 0 1 0 1 0 1 0 1 0 1 > model = LiblineaR(DM,L1) > predict(model,DM) $predictions [1] 0 1 0 1 0 1 0 1 0 1 > model = LiblineaR(SM1,L1) Error in t.default(data) : argument is not a matrix > model = LiblineaR(SM1,L1) Error in t.default(data) : argument is not a matrix Setting default kernel parameters > predict(model,DM) [,1] [1,] 0.1 [2,] 0.9 [3,] 0.1 [4,] 0.9 [5,] 0.1 [6,] 0.9 [7,] 0.1 [8,] 0.9 [9,] 0.1 [10,] 0.9 > model = ksvm(SM1,L1,scale=FALSE,kernel="vanilladot") Error in function (classes, fdef, mtable) : unable to find an inherited method for function "ksvm", for signature "dgCMatrix" > model = ksvm(SM2,L1,scale=FALSE,kernel="vanilladot") Error in function (classes, fdef, mtable) : unable to find an inherited method for function "ksvm", for signature "matrix.csr" > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.