Re: [R] QA

arun Sat, 25 May 2013 18:47:07 -0700

I thought you want to compare between the rows of two columns even if their 
corresponding values fall in the same row.



fun3<- function(mat){
        indmat<-combn(seq_len(ncol(mat)),2)
    lst1<- lapply(seq_len(ncol(indmat)),function(i) {mat[,indmat[,i]]})   
    
names(lst1)<-as.character(interaction(as.data.frame(t(indmat)),sep="_",drop=TRUE))
       
    lst2<- lapply(lst1,function(x) {x1<- apply(x,2,fun1)})
   
    lst3<- lapply(lst2,function(x) expand.grid(lapply(x,function(y) y[,1])))
    lst4<-lapply(lst3,function(x) unlist(x[which.min(apply(x,1,function(y) 
abs(diff(y)))),]) )
    lst5<- lapply(lst4,function(x){
                if(abs(diff(x))>(nrow(mat)/2)){
                nrow(mat)-abs(diff(x))
                    }
                        else(abs(diff(x)))
                    })
       
    lst6<- lapply(seq_along(lst5),function(i) {
                x2<-lst1[[i]]
                if(lst5[[i]]==0) {
                    #indx1<- seq(length(x2[,2]))
                    #sum(abs(x2[,1]-x2[indx1,2]))
             0     ######################## set to zero                     
            }
                else{
                    lapply(seq(1+lst5[[i]]),function(j){x3<-x2[,2]
                            indx1<-seq(length(x3)-(j-1))
                            indx2<-c(setdiff(seq_along(x3),indx1),indx1)
                            sum(abs(x2[,1]-x2[indx2,2]))
                            })
                }
                })   
               
    names(lst6)<- names(lst1)
    lst7<-lapply(lst6,unlist)
    lst8<- lapply(lst7,function(x) {
            Seq1<-seq_along(x)
            if(length(Seq1)==1) x
            else if(length(Seq1)==2){
                        sum(abs(x[1]-x[2]))   
                        }
            else{
                ind<-rep(Seq1,each=2)[-1]
                ind1<-ind[-length(ind)]
                
Reduce(`+`,lapply(split(ind1,(seq_along(ind1)-1)%/%2+1),function(i) {
                                    abs(diff(x[i]))
                                    }))                           
                }

                    }
            )
    lst9<-do.call(rbind,lst8)
    lst9       
    }

 fun3(mm)
#          [,1]
#1_2  2.5966051
#1_3  1.0267435
#1_4  0.0000000
#1_5  1.8489204
#1_6  0.0000000
#2_3  0.0000000
#2_4  1.9040790
#2_5  2.2874235
#2_6  5.1526016
#3_4  0.9726777
#3_5  2.1359229
#3_6  5.0221450
#4_5  0.9124638
#4_6  0.0000000
#5_6 14.0550864


xx
#      1  8  9 23 87 89
#[1,]  5  4  4  5  6 12
#[2,] 12 NA NA  9 NA NA
#[3,] NA NA NA 12 NA NA

According to xx, 1&4, 2&3, 4&6 (also 0 because both have 12)
A.K.
________________________________
From: eliza botto <eliza_bo...@hotmail.com>
To: "smartpink...@yahoo.com" <smartpink...@yahoo.com> 
Sent: Saturday, May 25, 2013 9:17 PM
Subject: RE: QA




thanks arun,
i dont think thANKyou is enough for wat u did. anyway, there is slight 
modification that i want to ask to understand the codes more efficiently. what 
if i want to consider the distance between the columns having atleast one peak 
in the same month equal to zero, instead of "initial value"?? 
more precisely The distance between column 2 and 3 should be zero instead of 
4.2951411. similarly the distance between column 4 and 6 should be zero instead 
of 8.260419. 
Thats just for my own knowledge to understand the loop. i hope you wont mind.
The loop works absolutely well.

Elisa


> Date: Sat, 25 May 2013 18:03:33 -0700
> From: smartpink...@yahoo.com
> Subject: Re: QA
> To: eliza_bo...@hotmail.com
> CC: r-help@r-project.org
> 
> Hi,
> I hope this works for you.
> fun1<- function(x){
>     big<- x>0.8*max(x) 
>     n<- length(big)
>     startRunOfBigs<- which(c(big[1],!big[-n] & big[-1]))
>     endRunOfBigs<- which(c(big[-n] & !big[-1], big[n]))
>     index<- vapply(seq_along(startRunOfBigs),function(i) 
> which.max(x[startRunOfBigs[i]:endRunOfBigs[i]])+startRunOfBigs[i]-1L,0L)
>      index<-ifelse(sum(is.na(match(index,c(1,12))))==0 & 
> x[index]!=max(x[index]), NA,index)
>     data.frame(Index=index[!is.na(index)],Value=x[index[!is.na(index)]])
>      }
> 
> ##mm: data
> fun3<- function(mat){
>         indmat<-combn(seq_len(ncol(mat)),2)
>     lst1<- lapply(seq_len(ncol(indmat)),function(i) {mat[,indmat[,i]]})    
>     
> names(lst1)<-as.character(interaction(as.data.frame(t(indmat)),sep="_",drop=TRUE))
>         
>     lst2<- lapply(lst1,function(x) {x1<- apply(x,2,fun1)})
>     
>     lst3<- lapply(lst2,function(x) expand.grid(lapply(x,function(y) y[,1])))
>     lst4<-lapply(lst3,function(x) unlist(x[which.min(apply(x,1,function(y) 
> abs(diff(y)))),]) )
>     lst5<- lapply(lst4,function(x){ 
>                 if(abs(diff(x))>(nrow(mat)/2)){
>                 nrow(mat)-abs(diff(x))
>                     }
>                         else(abs(diff(x)))
>                     })
>         
>     lst6<- lapply(seq_along(lst5),function(i) {
>                 x2<-lst1[[i]]
>                 if(lst5[[i]]==0) {
>                     indx1<- seq(length(x2[,2]))
>                     sum(abs(x2[,1]-x2[indx1,2]))
>                     }
>                 else{
>                     lapply(seq(1+lst5[[i]]),function(j){x3<-x2[,2]
>                             indx1<-seq(length(x3)-(j-1))
>                             indx2<-c(setdiff(seq_along(x3),indx1),indx1)
>                             sum(abs(x2[,1]-x2[indx2,2]))
>                             })
>                 }
>                 })    
>                 
>     names(lst6)<- names(lst1)
>     lst7<-lapply(lst6,unlist)
>     lst8<- lapply(lst7,function(x) {
>             Seq1<-seq_along(x)
>             if(length(Seq1)==1) x
>             else if(length(Seq1)==2){
>                         sum(abs(x[1]-x[2]))    
>                         }
>             else{ 
>                 ind<-rep(Seq1,each=2)[-1]
>                 ind1<-ind[-length(ind)]
>                 
> Reduce(`+`,lapply(split(ind1,(seq_along(ind1)-1)%/%2+1),function(i) {
>                                     abs(diff(x[i]))
>                                     }))                            
>                 }
> 
>                     }
>             )
>     do.call(rbind,lst8)    
>     }
> 
> fun3(mm)   #rownames represent the comparison between the particular columns
> #          [,1]
> #1_2  2.5966051
> #1_3  1.0267435
> #1_4  3.7387830
> #1_5  1.8489204
> #1_6  6.5233654
> #2_3  4.2951411
> #2_4  1.9040790
> #2_5  2.2874235
> #2_6  5.1526016
> #3_4  0.9726777
> #3_5  2.1359229
> #3_6  5.0221450
> #4_5  0.9124638
> #4_6  8.2604187
> #5_6 14.0550864
> 
> 
> A.K.
> 
> 
> 
> 
> 
> ________________________________
> From: eliza botto <eliza_bo...@hotmail.com>
> To: "smartpink...@yahoo.com" <smartpink...@yahoo.com> 
> Sent: Saturday, May 25, 2013 2:14 PM
> Subject: QA
> 
> 
> 
> 
> Dear Arun,
> [text file is attached]
> After your help on preparing loop for identifying peaks, here is my latest 
> question which is linked with my first question. but this time i will try to 
> make it more clear.
> 
> > dput(xx)
> structure(c(5L, 12L, NA, 4L, NA, NA, 4L, NA, NA, 5L, 9L, 12L, 
> 6L, NA, NA, 12L, NA, NA), .Dim = c(3L, 6L), .Dimnames = list(
>     NULL, c("1", "8", "9", "23", "87", "89")))
> > dput(mm)
> structure(c(0.706461987893674, 0.998391468394261, 0.72402995269242, 
> 1.70874688194537, 1.93906363083693, 0.89540353128442, 0.328327645695443, 
> 0.427434603701202, 0.591932250254601, 0.444627635494183, 1.44407704434405, 
> 1.79150336746345, 0.740380661614246, 1.39756784211974, 1.43602731683199, 
> 2.40482060634346, 1.61684982192949, 0.549848553223765, 0.245763715425745, 
> 0.315411788974968, 0.390626431538384, 0.369934560068472, 0.769100067815155, 
> 1.76366863411459, 0.480885978853889, 1.21441674507622, 2.50566408677391, 
> 3.27361599826255, 1.18508780425679, 0.465943778037697, 0.29380145690883, 
> 0.36356245877522, 0.373314458026047, 0.334849362386475, 0.882050057788756, 
> 0.626807814853613, 0.774295647517675, 0.853105130179133, 0.738085443815565, 
> 1.26063449947807, 1.57350832698427, 0.790095501697794, 0.510641105191147, 
> 0.874523657118082, 1.31257333325184, 0.882086374572265, 1.13881207205977, 
> 1.29163890813439, 0.0849732189580101, 0.070591276171845, 0.0926010253161898, 
> 0.362209761457517, 1.45769283057202, 3.16165004659667, 2.74903557756267, 
> 1.94633472878995, 1.19319875840883, 0.533232612926756, 0.225531074123974, 
> 0.122949089115578, 2.06195904001605, 1.41493262330451, 1.35748791897328, 
> 1.19490680241894, 0.702488756183322, 0.338258418490199, 0.123398398622741, 
> 0.138548982660226, 0.16170889185798, 0.414543218677095, 1.84629295875002, 
> 2.24547399004563), .Dim = c(12L, 6L))
> 
> 
> You can see that that there are two matrices. "mm" is the actual matrix and 
> "xx" is the matrix indentifying the peaks of "mm".For being a peak a value 
> has to either the maximum value or atleast 80% of the maximum value. you can 
> see that the maximum value of coulmn 1 is in row number 5 and thats what it 
> showed in matrix "xx" whereas, the 80% of the maximum value is in row number 
> 12 therefore it considered it the second peak and row number was shown in 
> "xx". i want to calculate the distance matrix of "mm" in the following way...
> The column are continous or cyclic.
> The subtraction should start from the peak and should end when the peaks of 
> two columns are in the same row. The peaks are to be moved towrds eachother 
> in the shortest possible way.
> For suppose the peak of colum 2 is in 4th row and the peak of column 6 is in 
> 12th row. Now moving these two peak towwards eachother requires moving col 2 
> in reverse direction or column 6 in forward direction.
> 
> For example
> 
> Initial:
> 
> Col 2
> 
> 1 2 3 4(max) 5 6 7 8 9 10 11 12
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> a<-sum(abs(col2-col6))
> 
> step1:
> 
> Col 2
> 
> 2 3 4(max) 5 6 7 8 9 10 11 12 1
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> b<-sum(abs(col2-col6))
> 
> step2:
> 
> Col 2
> 
> 3 4(max) 5 6 7 8 9 10 11 12 1 2
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> c<-sum(abs(col2-col6))
> 
> step3:
> 
> Col 2
> 
> 4(max) 5 6 7 8 9 10 11 12 1 2 3
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> d<-sum(abs(col2-col6))
> 
> step4:
> 
> Col 2
> 
> 5 6 7 8 9 10 11 12 1 2 3 4(max)
> 
> Col 6
> 
> 1 2 3 4 5 6 7 8 9 10 11 12(max)
> 
> e<-sum(abs(col2-col6))
> 
> total difference= abs(a-b)+abs(b-c)+abs(c-d)+abs(d-e)
> 
> 
> The dissimilarity is zero if the peaks are already in the same row. like for 
> column 2 and 3 the distance is zero as peaks are under eachother. For column 
> 1 and 4 the distance is onceagain zero. Although they have different nuber of 
> peaks but as atleast one of their peaks is under eachother therefore distance 
> is zero.
> 
> For Column 5 and 6 peaks can be moved in either direction as number of steps 
> to be followed are same.
> 
> for column 1 and 2 following is the procedure
> 
> Col1 has two maximum values in row 5th and 12th and column two has only one 
> maximum value at 4 row. As peak in 5th row of column one is closer to the 
> peak of column 2 therefore we will move towards it and procedure should be
> 
> 
> Initial:
> 
> Col 1
> 
> 1 2 3 4 5(max) 6 7 8 9 10 11 12(max)
> 
> Col 8
> 
> 1 2 3 4(max) 5 6 7 8 9 10 11 12
> 
> a<-sum(abs(col1-col8))
> 
> Step1: 
> 
> Col 1
> 
> 1 2 3 4 5(max) 6 7 8 9 10 11 12(max)
> 
> Col 8
> 
> 12 1 2 3 4(max) 5 6 7 8 9 10 11
> 
> b<-sum(abs(col1-col8))
> 
> total difference=abs(a-b)
> 
> For column 4 and 5
> 
> Initial:
> 
> Col 4
> 
> 1 2 3 4 5(max) 6 7 8 9(max) 10 11 12(max)
> 
> Col 5
> 
> 1 2 3 4 5 6(max) 7 8 9 10 11 12
> 
> a<-sum(abs(col4-col5))
> 
> Step 1
> 
> Col 4
> 
> 1 2 3 4 5(max) 6 7 8 9(max) 10 11 12(max)
> 
> Col 5
> 
> 2 3 4 5 6(max) 7 8 9 10 11 12 1
> 
> b<-sum(abs(col4-col5))
> 
> Total Difference= abs(a-b)
> 
> If there is any point which i couldnt discuss please tell me...
> 
> 
> Elisa         

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] QA

Reply via email to