Hi Ravi, My hunch would be "no" because it seems awfully inefficient. Packages are mirrored all over the world, and it seems rather silly to be mirroring, updating, etc. large datasets.
The good news is that if you just want a 10,000 x 100,000 matrix of 0/1s, it is trivial to generate: X <- matrix(sample(0L:1L, 10^9, TRUE), nrow = 10^4) Even stored as integers, this is probably going to be around 4GB. If you want arbitrary values to later cut: X <- matrix(rnorm(10^9), nrow = 10^4) Cheers, Josh On Sun, Jun 24, 2012 at 7:08 AM, vioravis <viora...@gmail.com> wrote: > I am looking for some large datasets (10,000 rows & 100,000 columns or vice > versa) to create some test sets. I am not concerned about the invidividual > elements since I will be converting them to binary (0/1) by using arbitrary > thresholds. > > Does any R package provide such big datasets? > > Also, what is the biggest text document collection available in R? tm > package seems to provide only 20 records from the Reuters dataset. Is there > any package that has 10,000+ documents?? > > Would appreciate any help on these. > > Thank you. > > Ravi > > -- > View this message in context: > http://r.789695.n4.nabble.com/Large-Test-Datasets-in-R-tp4634330.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.