Re: [R] keep the centre fixed in K-means clustering

HJ YAN Wed, 22 May 2013 08:43:02 -0700

Dear Uwe

Just wanted to say thank you so much for this as whilst waiting for a reply
from r-help I had been wrting a piece of ugly code as below to do the job
and yours looks MUCH smarter and I especially like the use of '-apply()'
bit as there is no 'min.col()' function.


p.s. my data has 144 values for each data point (e.g. collected at 10
minute interval for a day)


Also, I wondered if there is a quick manual book/references to all those
useful functions we can learn R in a more efficient way, e.g. I wrote my
function below because I did not know col.Sums/ row.Sums etc. exist so only
could think about a combination of using a loop+ 'which.min()' !


Massive thanks again!
HJ




center<-matrix(rnorm(1440,sd=0.5),nrow=10)
centre1<-cbind(center,c(1:10))

NewData<-matrix(rnorm(1440*5),nrow=50)
NewData1<-cbind(NewData,rep(NA,nrow(NewData)))

clust_HJ<-function(NewData=NewData1, Centre=centre1){

for(i in 1:nrow(NewData)){
tmp1<-rbind(Centre[,c(1:144)],NewData[i,c(1:144)])
dist.matrix<-as.matrix(dist(tmp1, method ="euclidian"))
Ind.dist.min<-which.min(dist.matrix[11,c(1:10)])
NewData[i,145]<-Ind.dist.min

} # end i loop

output.file=NewData
write.csv(output.file,"NewData2.csv",row.names=F)
} # end function

clust_HJ(NewData1,centre1)


On Wed, May 22, 2013 at 10:55 AM, Uwe Ligges <
lig...@statistik.tu-dortmund.de> wrote:

> So you just want to compare the distances from each point of your new data
> to each of the Centres and assign the corresponding number of the centre as
> in:
>
> clust <- apply(NewData, 1, function(x) which.min(colSums(x - tCentre)^2))))
>
>
> but since the apply loop is rather long here for lots of new data, one may
> want to optimize the runtime for huge data and get:
>
> tNewData <- t(NewData)
> clust <- max.col(-apply(Centre, 1, function(x) colSums((x - tNewData)^2)))
>
>
> Best,
> Uwe Ligges
>
>
>
>
>
> On 21.05.2013 13:19, HJ YAN wrote:
>
>> Dear R users
>>
>>
>> I have the matrix of the centres of some clusters, e.g. 20 clusters each
>> with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
>> values.
>>
>> I have collected new data (each with 100 numeric values) and would like to
>> keep the above 20 centres fixed/'unmoved' whilst just see how my new data
>> fit in this grouping system, e.g. if the data is close to cluster 1 than
>> lable it 'cluster 1'.
>>
>> If the above matrix of centre is called 'Centre' (a 20*100 matrix) and my
>> new data 'NewData' has 500 observations, by using kmeans() will update the
>> centres:
>>
>> kmeans(NewData, Centre)
>>
>>
>> I wondered if there is other R packages out there can keep the centres
>> fixed and lable each observations of my new data? Or I have to write my
>> own
>> function?
>>
>> To illustrate my task using a simpler example:
>>
>> I have
>>
>> Centre<- matrix(c(0,1,0,1), nrow=2)
>>
>> # the two created centres in a two dimentional case are
>> Centre
>>       [,1] [,2]
>> [1,]    0    0
>> [2,]    1    1
>>
>> NewData<-rbind(matrix(rnorm(**100, sd = 0.3), ncol = 2),
>>              matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
>>
>>   NewData1<-cbind(c1:100), NewData)
>> colnames(NewData1)<-c("ID","x"**,"y")
>>
>> # my data
>>   head(NewData1)
>>       ID          x          y
>> [1,]  1 -0.3974660  0.1541685
>> [2,]  2  0.5321347  0.2497867
>> [3,]  3  0.2550276  0.1691720
>> [4,]  4 -0.1162162  0.6754874
>> [5,]  5  0.1570996  0.1175119
>> [6,]  6  0.4816195 -0.6836226
>>
>> ## I'd like to have outcome as below (whilst keep the tow centers fixed):
>>
>>             ID        x             y                      Cluster
>> [1,] 1       -0.3974660 0.1541685             1
>> [2,] 2        0.5321347 0.2497867             1
>> [3,] 3        0.2550276 0.1691720             1
>> [4,] 4       -0.1162162 0.6754874             1
>>
>> ...
>> [55,]  55         1.1570996  1.1175119         2
>> [56,]  56         1.4816195  1.6836226         2
>>
>>
>> p.s. I use Euclidian to obtain/calculate distance matrix.
>>
>>
>> Many thanks in advance
>>
>> HJ
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________**________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/**
>> posting-guide.html <http://www.R-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] keep the centre fixed in K-means clustering

Reply via email to