Nearest Neighbors?

Tal Galili Thu, 25 Feb 2010 00:19:33 -0800

Hello Manyu,
I am guessing you refer to the netflix dataset.

Try looking at ways to represent large data sets, that is, the list from
here:
http://cran.r-project.org/web/views/HighPerformanceComputing.html



Here it is:

*Large memory and out-of-memory data*

   - The biglm <http://cran.r-project.org/web/packages/biglm/index.html>
package
   by Lumley uses incremental computations to offers lm() and glm()
functionality
   to data sets stored outside of R's main memory.
   - The ff <http://cran.r-project.org/web/packages/ff/index.html> package
   by Adler et al. offers file-based access to data sets that are too large to
   be loaded into memory, along with a number of higher-level functions.
   - The bigmemory<http://cran.r-project.org/web/packages/bigmemory/index.html>
package
   by Kane and Emerson permits storing large objects such as matrices in memory
   and uses external pointer objects to refer to them. This permits transparent
   access from R without bumping against R's internal memory limits. Several R
   processes on the same computer can also shared big memory objects.
   - A large number of database packages, and database-alike packages (such
   as sqldf <http://cran.r-project.org/web/packages/sqldf/index.html> by
   Grothendieck and
data.table<http://cran.r-project.org/web/packages/data.table/index.html>
by
   Dowle) are also of potential interest but not reviewed here.
   - The 
HadoopStreaming<http://cran.r-project.org/web/packages/HadoopStreaming/index.html>
package
   provides a framework for writing map/reduce scripts for use in Hadoop
   Streaming; it also facilitates operating on data in a streaming fashion
   which does not require Hadoop.
   - The speedglm<http://cran.r-project.org/web/packages/speedglm/index.html>
package
   permits to fit (generalised) linear models to large data. For in-memory data
   sets, speedlm() or speedglm() can be used along with update.speedlm() which
   can update fitted models with new data. For out-of-memory data sets, shglm()
   is available; it works in the presence of factors and can check for singular
   matrices.
   - The biglars
<http://cran.r-project.org/web/packages/biglars/index.html> package
   by Seligman et al can use the
ff<http://cran.r-project.org/web/packages/ff/index.html> to
   support large-than-memory datasets for least-angle regression, lasso and
   stepwise regression.






----------------Contact
Details:-------------------------------------------------------
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------




On Thu, Feb 25, 2010 at 12:00 AM, manyu_aditya
<abhimanyu.adi...@gmail.com>wrote:

>
> hi,
>
> I have a dataset (the netflix dataset) which is basically ~18k columns and
> well variable number of rows but let's assume 25 thousand for now. The
> dataset is very sparse. I was wondering how to do kmeans/nearest neighbors
> or kernel density estimation on it.
>
> I tired using the spMatrix function in "Matrix" package. I think I'm able
> to
> create the matrix but as soon as I pass it to kmeans functions in package
> "stats" it says cannot allocate 3.3Gb. Which is basically 18k * 25K * 8.
>
> There is a sparse kmeans solver by tibshirani but that epxects a regular
> dense format matrix so again the issue is the same.
>
> A simple "no" this is not possible answer shall suffice as long as you are
> right!!!
>
> tHanks much.
> --
> View this message in context:
> http://n4.nabble.com/Sparse-KMeans-KDE-Nearest-Neighbors-tp1568129p1568129.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sparse KMeans/KDE/Nearest Neighbors?

Reply via email to