Re: [R] Vectorization of three embedded loops

Thomas Terhoeven-Urselmans Wed, 14 Jan 2009 06:59:34 -0800

Dear Carlos,

thanks for your support. Patrick Burns gave me a hint, which is in the  
end very similar to your proposal. Now the script is roughly 25 times  
faster.


Here is the code (I implemented as well an in size not increasing  
vector 'summ.dist<-rep(0,val.x.c.n)'):

KEN.STO<-function(val.n,start.b,val,euc){

for(k in 1:val.n){
summ.dist<-rep(0,val.n);
for(i in 1:length(start.b[,1])){
        summ.dist[i]<-min(euc[rownames(start.b)[i],val]);
        }
bla<-rownames(start.b)[which(summ.dist==max(summ.dist))]
val<-c(val,bla[1]);
start.b<-start.b[-(which(match(rownames(start.b),val[length(val)])! 
="NA")),];
if(length(val)>=val.n)break;
}
return(val.x.c);
}

Regards,

Thomas

On 14 Jan 2009, at 12:58, Carlos J. Gil Bellosta wrote:

> Hello,
>
> I believe that your bottleneck lies at this piece of code:
>
> sum<-c();
> for(j in 1:length(val)){
>       sum[j]<-euc[rownames(start.b)[i],val[j]]
> }
>
> In order to speed up your code, there are two alternatives:
>
> 1) Try to reorder the euc matrix so that the sum vector corresponds to
> (part of) a row or column of euc.
>
> 2) For each i value, create a matrix with the coordinates  
> corresponding
> to ( rownames(start.b)[i], val[j] ) and index the matrix by this  
> matrix
> in order to create sum. This will be easiest if you can reorder euc  
> in a
> way that accessing its elements will be easy (and then you would be  
> back
> into (1)).
>
> Creating a variable sum as c() and increasing its size in a loop is  
> one
> of the easiest ways to uselessly burn your CPU.
>
> Best regards,
>
> Carlos J. Gil Bellosta
> http://www.datanalytics.com
>
>
> On Wed, 2009-01-14 at 10:32 +0300, Thomas Terhoeven-Urselmans wrote:
>> Dear R-programmer,
>>
>> I wrote an adapted implementation of the Kennard-Stone algorithm for
>> sample selection of multivariate data (R 2.7.1 under MacBook Pro,
>> Processor 2.2 GHz Intel Core 2 Duo, Memory 2 GB 667 MHZ DDR2 SDRAM).
>> I used for the heart of the script three embedded loops. This makes  
>> it
>> especially for huge datasets very slow. For a datamatrix of 1853*1853
>> and the selection of 556 samples needed computation time of more than
>> 24 hours.
>> I did some research on vecotrization, but I could not figure out how
>> to do it better/faster. Which ways are there to replace the time
>> consuming loops?
>>
>> Here are some information:
>>
>> # val.n<-24;
>> # start.b<-matrix(nrow=1812, ncol=20);
>> # val is a vector of the rownames of 22 in an earlier step chosen
>> extrem samples;
>> # euc<-<-matrix(nrow=1853, ncol=1853); [contains the Euclidean
>> distance calculations]
>>
>> The following calculation of the system.time was for the selection of
>> two samples:
>> system.time(KEN.STO(val.n,start.b,val.start,euc))
>>    user  system elapsed
>>  25.294  13.262  38.927
>>
>> The function:
>>
>> KEN.STO<-function(val.n,start.b,val,euc){
>>
>> for(k in 1:val.n){
>> sum.dist<-c();
>> for(i in 1:length(start.b[,1])){
>>      sum<-c();
>>      for(j in 1:length(val)){
>>              sum[j]<-euc[rownames(start.b)[i],val[j]]
>>              }
>>              sum.dist[i]<-min(sum);
>>      }
>> bla<-rownames(start.b)[which(sum.dist==max(sum.dist))]
>> val<-c(val,bla[1]);
>> start.b<-start.b[-(which(match(rownames(start.b),val[length(val)])!
>> ="NA")),];
>> if(length(val)>=val.n)break;
>> }
>> return(val);
>> }
>>
>> Regards,
>>
>> Thomas
>>
>> Dr. Thomas Terhoeven-Urselmans
>> Post-Doc Fellow
>> Soil infrared spectroscopy
>> World Agroforestry Center (ICRAF)
>>      [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



Regards,

Thomas

Dr. Thomas Terhoeven-Urselmans
Post-Doc Fellow
Soil infrared spectroscopy
World Agroforestry Center (ICRAF)
United Nations Avenue, Gigiri
PO Box 30677-00100 Nairobi, Kenya
Ph: 254 20 722 4113 or via USA 1 650 833 6654 ext. 4113
Fax 254 20 722 4001 or via USA 1 650 833 6646
Email: t.urselm...@cgiar.org
Internet: http://worldagroforestrycentre.org







        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Vectorization of three embedded loops

Reply via email to