Dear Uwe Just wanted to say thank you so much for this as whilst waiting for a reply from r-help I had been wrting a piece of ugly code as below to do the job and yours looks MUCH smarter and I especially like the use of '-apply()' bit as there is no 'min.col()' function.
p.s. my data has 144 values for each data point (e.g. collected at 10 minute interval for a day) Also, I wondered if there is a quick manual book/references to all those useful functions we can learn R in a more efficient way, e.g. I wrote my function below because I did not know col.Sums/ row.Sums etc. exist so only could think about a combination of using a loop+ 'which.min()' ! Massive thanks again! HJ center<-matrix(rnorm(1440,sd=0.5),nrow=10) centre1<-cbind(center,c(1:10)) NewData<-matrix(rnorm(1440*5),nrow=50) NewData1<-cbind(NewData,rep(NA,nrow(NewData))) clust_HJ<-function(NewData=NewData1, Centre=centre1){ for(i in 1:nrow(NewData)){ tmp1<-rbind(Centre[,c(1:144)],NewData[i,c(1:144)]) dist.matrix<-as.matrix(dist(tmp1, method ="euclidian")) Ind.dist.min<-which.min(dist.matrix[11,c(1:10)]) NewData[i,145]<-Ind.dist.min } # end i loop output.file=NewData write.csv(output.file,"NewData2.csv",row.names=F) } # end function clust_HJ(NewData1,centre1) On Wed, May 22, 2013 at 10:55 AM, Uwe Ligges < lig...@statistik.tu-dortmund.de> wrote: > So you just want to compare the distances from each point of your new data > to each of the Centres and assign the corresponding number of the centre as > in: > > clust <- apply(NewData, 1, function(x) which.min(colSums(x - tCentre)^2)))) > > > but since the apply loop is rather long here for lots of new data, one may > want to optimize the runtime for huge data and get: > > tNewData <- t(NewData) > clust <- max.col(-apply(Centre, 1, function(x) colSums((x - tNewData)^2))) > > > Best, > Uwe Ligges > > > > > > On 21.05.2013 13:19, HJ YAN wrote: > >> Dear R users >> >> >> I have the matrix of the centres of some clusters, e.g. 20 clusters each >> with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric >> values. >> >> I have collected new data (each with 100 numeric values) and would like to >> keep the above 20 centres fixed/'unmoved' whilst just see how my new data >> fit in this grouping system, e.g. if the data is close to cluster 1 than >> lable it 'cluster 1'. >> >> If the above matrix of centre is called 'Centre' (a 20*100 matrix) and my >> new data 'NewData' has 500 observations, by using kmeans() will update the >> centres: >> >> kmeans(NewData, Centre) >> >> >> I wondered if there is other R packages out there can keep the centres >> fixed and lable each observations of my new data? Or I have to write my >> own >> function? >> >> To illustrate my task using a simpler example: >> >> I have >> >> Centre<- matrix(c(0,1,0,1), nrow=2) >> >> # the two created centres in a two dimentional case are >> Centre >> [,1] [,2] >> [1,] 0 0 >> [2,] 1 1 >> >> NewData<-rbind(matrix(rnorm(**100, sd = 0.3), ncol = 2), >> matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2)) >> >> NewData1<-cbind(c1:100), NewData) >> colnames(NewData1)<-c("ID","x"**,"y") >> >> # my data >> head(NewData1) >> ID x y >> [1,] 1 -0.3974660 0.1541685 >> [2,] 2 0.5321347 0.2497867 >> [3,] 3 0.2550276 0.1691720 >> [4,] 4 -0.1162162 0.6754874 >> [5,] 5 0.1570996 0.1175119 >> [6,] 6 0.4816195 -0.6836226 >> >> ## I'd like to have outcome as below (whilst keep the tow centers fixed): >> >> ID x y Cluster >> [1,] 1 -0.3974660 0.1541685 1 >> [2,] 2 0.5321347 0.2497867 1 >> [3,] 3 0.2550276 0.1691720 1 >> [4,] 4 -0.1162162 0.6754874 1 >> >> ... >> [55,] 55 1.1570996 1.1175119 2 >> [56,] 56 1.4816195 1.6836226 2 >> >> >> p.s. I use Euclidian to obtain/calculate distance matrix. >> >> >> Many thanks in advance >> >> HJ >> >> [[alternative HTML version deleted]] >> >> ______________________________**________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.