On 12 Oct 2012, at 09:46, Purna chander wrote: > 4) scenario4: >> x<-read.table("query.vec") >> v<-read.table("query.vec2") >> v<-as.matrix(v) >> d<-dist(rbind(v,x),method="manhattan") >> m<-as.matrix(d) >> m2<-m[1:nrow(v),(nrow(v)+1):nrow(x)] >> print(m2[1,1:10]) > > time taken for running the code: > real 0m0.445s > user 0m0.401s > sys 0m0.041s > 1) Though scenario 4 is optimum, this scenario failed when matrix 'v' > having more no. of rows. An error occurred while converting distance > object 'd' to a matrix 'm'. > For E.g: > m<-as.matrix(d) > the above command resulted in error: "Error: cannot allocate > vector of size 922.7 MB".
That's because you're calculating a full distance matrix with (10000+100) * (10000+100) points and then extract the much smaller number of distance values (10000 * 100) that you actually need. I have a use case with similar requirements, so ... > 3) Any other ideas to optimize the problem i'm facing with. ... my experimental "wordspace" package includes a function dist.matrix() for calculating such cross-distance matrices. The function is written in C code and doesn't handle NA's and NaN's properly, but it's considerably faster than the current implementation of dist(). I haven't uploaded the package to CRAN yet, but you should be able to install with install.packages("wordspace", repos="http://R-Forge.R-project.org";) Best, Stefan PS: Glad to see that daily builds on R-Forge work again -- that's an extremely useful feature to get beta testers for experimental package versions. :-) ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.