Hello Manyu, I am guessing you refer to the netflix dataset. Try looking at ways to represent large data sets, that is, the list from here: http://cran.r-project.org/web/views/HighPerformanceComputing.html
Here it is: *Large memory and out-of-memory data* - The biglm <http://cran.r-project.org/web/packages/biglm/index.html> package by Lumley uses incremental computations to offers lm() and glm() functionality to data sets stored outside of R's main memory. - The ff <http://cran.r-project.org/web/packages/ff/index.html> package by Adler et al. offers file-based access to data sets that are too large to be loaded into memory, along with a number of higher-level functions. - The bigmemory<http://cran.r-project.org/web/packages/bigmemory/index.html> package by Kane and Emerson permits storing large objects such as matrices in memory and uses external pointer objects to refer to them. This permits transparent access from R without bumping against R's internal memory limits. Several R processes on the same computer can also shared big memory objects. - A large number of database packages, and database-alike packages (such as sqldf <http://cran.r-project.org/web/packages/sqldf/index.html> by Grothendieck and data.table<http://cran.r-project.org/web/packages/data.table/index.html> by Dowle) are also of potential interest but not reviewed here. - The HadoopStreaming<http://cran.r-project.org/web/packages/HadoopStreaming/index.html> package provides a framework for writing map/reduce scripts for use in Hadoop Streaming; it also facilitates operating on data in a streaming fashion which does not require Hadoop. - The speedglm<http://cran.r-project.org/web/packages/speedglm/index.html> package permits to fit (generalised) linear models to large data. For in-memory data sets, speedlm() or speedglm() can be used along with update.speedlm() which can update fitted models with new data. For out-of-memory data sets, shglm() is available; it works in the presence of factors and can check for singular matrices. - The biglars <http://cran.r-project.org/web/packages/biglars/index.html> package by Seligman et al can use the ff<http://cran.r-project.org/web/packages/ff/index.html> to support large-than-memory datasets for least-angle regression, lasso and stepwise regression. ----------------Contact Details:------------------------------------------------------- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Thu, Feb 25, 2010 at 12:00 AM, manyu_aditya <abhimanyu.adi...@gmail.com>wrote: > > hi, > > I have a dataset (the netflix dataset) which is basically ~18k columns and > well variable number of rows but let's assume 25 thousand for now. The > dataset is very sparse. I was wondering how to do kmeans/nearest neighbors > or kernel density estimation on it. > > I tired using the spMatrix function in "Matrix" package. I think I'm able > to > create the matrix but as soon as I pass it to kmeans functions in package > "stats" it says cannot allocate 3.3Gb. Which is basically 18k * 25K * 8. > > There is a sparse kmeans solver by tibshirani but that epxects a regular > dense format matrix so again the issue is the same. > > A simple "no" this is not possible answer shall suffice as long as you are > right!!! > > tHanks much. > -- > View this message in context: > http://n4.nabble.com/Sparse-KMeans-KDE-Nearest-Neighbors-tp1568129p1568129.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.