spliting variable into groups by i.e. between A-B, B-C, C-D, from: A, NA, NA, B, NA, NA, C, NA, NA, NA, D

Eugeniusz Kaluza Fri, 25 Jun 2010 05:49:19 -0700

Dear useRs,

at the beginning, 
Joris Meys, thank you for explaining how to obtain calculation result possible 
for groups between string marks in one variable in data frame, like in this 
example below (between START and STOP), wchich I would like to complete at the 
end by asking about... how is possible to mark each observations presented in 
oryginal data set


# so firstly, below 
# START...working example of solution proposed by: Joris Meys 
[jorism...@gmail.com] 
# Same trick :
  c0<-rbind( 1,      2 , 3, 4,      5, 6, 7, 8, 9,10,11,
  12,13,14,15,16,17     )
  c0 
  c1<-rbind(10,     20 ,30,40,     50,10,60,20,30,40,50,      30,10,
  0,NA,20,10.3444)
  c1
  c2<-rbind(NA,"A",NA,NA,"B",NA,NA,NA,NA,NA,NA,"C",NA,NA,NA,NA,"D")
  c2

  pos <- which(!is.na(C.df$c2))
  idx <- sapply(2:length(pos),function(i) pos[i-1]:(pos[i]-1))
  names(idx) <- sapply(2:length(pos),
      function(i) paste(C.df$c2[pos[i-1]],"-",C.df$c2[pos[i]]))
  out <- lapply(idx,function(i) summary(C.df[i,1:2]))
  out
#STOP ... "below from: Sent:     Thu 2010-06-24 18:02:  Joris Meys 
[jorism...@gmail.com]


#Thank you, it is done and works very well

# - - - - - - - -- - - - - - -- - -
# Now, I try to finish my question to add gruping sybol to the whole set, 
making 
# each observation marked by the name of the interval in which that observation 
is placed.
# to tell the observator, that this observation is between ...A and B, to 
enable sorting, to eneable simple acess using match
in_sub_starting_from<-rbind(NA,"A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C")
in_sub_finished_by 
<-rbind(NA,"B","B","B","C","C","C","C","C","C","C","D","D","D","D","D","D")
in_sub_limited_by<-rbind(NA,"A-B","A-B","A-B","B-C","B-C","B-C","B-C","B-C","B-C","B-C","C-D","C-D","C-D","C-D","C-D","C-D")
C.df<-data.frame(c0,c1,c2,in_sub_starting_from,in_sub_finished_by,in_sub_limited_by)
C.df
#----------------------------------------------------

# Therefore my one more question: 
How is possible to create these vectors automaticly, having  C.df$c2 (and of 
course having also: C.df$c0,C.df$c1), :
C.df$in_sub_starting_from
C.df$in_sub_finished_by
C.df$in_sub_limited_by
#to tell the observator, that this observation is between ...A and B, to enable 
sorting, to eneable simple acess using match


#for example, to make possible this access to data:
#to to take the 7'th observation from any row of data frame,
C.df$c0[7]
C.df$c1[c0==7]
#and could
#find in this same row in_sub_starting_from  that observation is preceded by 
...         
C.df$in_sub_starting_from[c0==7]
#find in this same row in_sub_finished_by  that observation is before ...       
  
C.df$in_sub_finished_by[c0==7]
#find in this same row in_sub_finished_by  that this observation is between ... 
        
C.df$in_sub_limited_by[c0==7]
#----------------------------------------------------

?





#Thanks for advices, and maybe and this answer, 
#looking impatiently for time with possible access to internet... 

#

Sincerely,
Kaluza


and the beginnig of this story;
____________________________________________________________________________________






-----Original Message-----
From: Eugeniusz Kaluza
Sent: Thu 2010-06-24 17:12
To: r-help@r-project.org
Subject: PD: [R] ?to calculate sth for groups defined between points in one 
variable (string), / value separating/ spliting variable into groups by i.e. 
between start, NA, NA, stop1, start2, NA, stop2

Dear useRs,

Thanks for advice from Joris Meys, 
Now will try to think how to make it working for less specyfic case, 
to make the problem more general.
Then the result should be displayed for every group between non empty string in 
c2 
i.e. not only result for:
 #mean:
          c1     c3    c4           c5
          20  Start1 Stop1 Start1-Stop1
    25.48585  Start2 Stop2 Start2-Stop2 

but also for every one group created by space between two closest strings in 
c2, that contains only seriess of Na, NA, NA, separated from time to time by 
one string i.e.:
 #mean:
          c1     c3    c4           c5
          20 Start1 Stop1 Start1-Stop1
          .. Stop1 Start2 Stop1-Start2
    25.48585  Start2 Stop2 Start2-Stop2 

i.e.
to rewrite this maybe for another simpler version of command

but also for every one group created by space between two closest strings in 
c2, that contains only seriess of Na, NA, NA, separated from time to time by 
one string A, NA, NA, NA, NA, B, NA, NA, NA, C, NA,NA,NA,NA,D, NA,NA
i.e.:
 #mean:
          c1     c3    c4           c5
          20      A     B          A-B
          ..      B     C          B-C
    25.48585      C     D          C-D 
...................


Looking for more general method (function), grouping between these letters in 
c2,
I will now try to study solution proposed by Joris Meys
Thanks for immediate aswer  
Kaluza




-----Wiadomosc oryginalna-----
Od: Joris Meys [mailto:jorism...@gmail.com]
Wyslano: Cz 2010-06-24 15:14
Do: Eugeniusz Kaluza
DW: r-help@r-project.org
Temat: Re: [R] ?to calculate sth for groups defined between points in one 
variable (string), / value separating/ spliting variable into groups by i.e. 
between start, NA, NA, stop1, start2, NA, stop2

On Thu, Jun 24, 2010 at 1:18 PM, Eugeniusz Kaluza
<eugeniusz.kal...@polsl.pl> wrote:
>
> Dear useRs,
>
> Thanks for any advices
>
> # I do not know where are the examples how to mark groups
> #  based on signal occurence in the additional variable: cf. variable c2,
> # How to calculate different calculations for groups defined by (split by 
> occurence of c2 characteristic data)
>
>
> #First example of simple data
> #mexample   1      2    3  4     5  6  7  8  9  10 11       12 13 14 15 16 17
> c0<-rbind( 1,      2 , 3, 4,      5, 6, 7, 8, 9,10,11,      12,13,14,15,16,17 
>     )
> c0
> c1<-rbind(10,     20 ,30,40,     50,10,60,20,30,40,50,      30,10, 
> 0,NA,20,10.3444)
> c1
> c2<-rbind(NA,"Start1",NA,NA,"Stop1",NA,NA,NA,NA,NA,NA,"Start2",NA,NA,NA,NA,"Stop2")
> c2
> C.df<-data.frame(cbind(c0,c1,c2))
> colnames(C.df)<-c("c0","c1","c2")
> C.df
>
> # preparation of form for explaining further needed result (next 3 lines are 
> not needed indeed, they are only  to explain how to obtain final result
>  c3<-rbind(NA,"Start1","Start1","Start1","Start1","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2","Start2")
>  c4<-rbind(NA, "Stop1", "Stop1", "Stop1", "Stop1", "Stop2", "Stop2", "Stop2", 
> "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", "Stop2", 
> "Stop2")
>  C.df<-data.frame(cbind(c0,c1,c2,c3,c4))
>  colnames(C.df)<-c("c0","c1","c2","c3","c4")
>  C.df$c5<-paste(C.df$c3,C.df$c4,sep="-")
>  C.df
>
Now this is something I don't get. The list "Start2-Stop2" starts way
before Start2, actually at Stop1. Sure that's what you want?

I took the liberty of showing how to get the data between start and
stop for every entry, and how to apply functions to it. If you don't
get the code, look at
?lapply
?apply
?grep

I also adjusted your example, as you caused all variables to be
factors by using the cbind in the data.frame function. Never do this
unless you're really sure you have to. But I can't think of a case
where that would be beneficial...

...
C.df<-data.frame(c0,c1,c2)
C.df

# find positions
Start <- grep("Start",C.df$c2)
Stop <- grep("Stop",C.df$c2)

# create indices
idx <- apply(cbind(Start,Stop),1,function(i) i[1]:i[2])
names(idx) <- paste("Start",1:length(Start),"-Stop",1:length(Start),sep="")

# Apply the function summary and get a list back named by the interval.
out <- lapply(idx,function(i) summary(C.df[i,1:2]))
out

If you really need to start Start2 right after Stop1, you can use a
similar approach.

Cheers
Joris

> # NEEDED RESULTS
>  # needed result
> # for Stat1-Stop1: mean(20,30,40,50)
> # for Stat2-Stop2: mean(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
> #mean:
>         c1     c3    c4           c5
>         20  Start1 Stop1 Start1-Stop1
>   25.48585  Start2 Stop2 Start2-Stop2
>
> #sum
> # for Stat1-Stop1: sum(20,30,40,50)
> # for Stat2-Stop2: sum(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
> #sum:
>         c1     c3    c4           c5
>        140  Start1 Stop1 Start1-Stop1
>   280.3444  Start2 Stop2 Start2-Stop2
>
> # for Stat1-Stop1: max(20,30,40,50)
> # for Stat2-Stop2: max(c(10,60,20,30,40,50,30,10,0,NA,20,10.3444), na.rm=T)
> #max:
>         c1     c3    c4           c5
>        50  Start1 Stop1 Start1-Stop1
>        60  Start2 Stop2 Start2-Stop2
>
> # place of max  (in Start1-Stop1: 4 th element in gruop Start1-Stop1
> # place of max  (in Start1-Stop1: 2 nd element in gruop Start1-Stop1
>
>        c0     c3    c4           c5
>         4  Start1 Stop1 Start1-Stop1
>         2  Start2 Stop2 Start2-Stop2
>
>
> Thanks for any suggestion,
> Kaluza
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] create group markers in original data frame ie. countinued... ? to calculate sth for groups defined between points in one variable (string), /separating/ spliting variable into groups by i.e. between A-B, B-C, C-D, from: A, NA, NA, B, NA, NA, C, NA, NA, NA, D

Reply via email to