Maybe base R's unique() function might be useful? It uses hashing I believe.
Bert On Sat, Apr 21, 2018, 12:17 PM Jack Arnestad <jackarnes...@gmail.com> wrote: > I have a very large binary matrix, stored as a big.matrix to conserve > memory (it is over 2 gb otherwise - 5 million columns and 100 rows). > > r <- 100 > c <- 10000 > m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c) > m4 <- cbind(m4, 1) > m4 <- as.big.matrix(m4) > > I need to remove every column which has only one unique value (in this > case, only 0s or only 1s). Because of the number of columns, I want to be > able to do this in parallel. > > How can I accomplish this while keeping the data compressed as a > big.matrix? I can convert it into a df and loop over the columns looking > for the number of unique values, but this takes too much RAM. > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.