hi,

I have a dataset (the netflix dataset) which is basically ~18k columns and
well variable number of rows but let's assume 25 thousand for now. The
dataset is very sparse. I was wondering how to do kmeans/nearest neighbors
or kernel density estimation on it. 

I tired using the spMatrix function in "Matrix" package. I think I'm able to
create the matrix but as soon as I pass it to kmeans functions in package
"stats" it says cannot allocate 3.3Gb. Which is basically 18k * 25K * 8.

There is a sparse kmeans solver by tibshirani but that epxects a regular
dense format matrix so again the issue is the same. 

A simple "no" this is not possible answer shall suffice as long as you are
right!!!

tHanks much.
-- 
View this message in context: 
http://n4.nabble.com/Sparse-KMeans-KDE-Nearest-Neighbors-tp1568129p1568129.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to