On Jul 19, 2011, at 11:58 AM, William Dunlap wrote:


-----Original Message-----
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org ] On Behalf Of Daniel Malter
Sent: Tuesday, July 19, 2011 1:51 AM
To: r-help@r-project.org
Subject: Re: [R] Centering data frame by factor


P1-tapply(P1,Experiment,mean)[Experiment]

Note that the above solution works in this example
because Experiment takes the values 1 and 2.  If
Experiment were coded as, say, 101 and 102 the above
would not work.  This is a case where converting
Experiment to a factor would avoid problems.

I checked to see if my ave solution was subject to the same caveats and it is not. The help page is less categorical about what the grouping variables' structure should be, saying only that they are "typically factors".

 E.g.,
RAW <- data .frame ("Experiment "= c (2,2,2,1,1,1 ),"Group "= c ("B ","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))
RAW$E <- RAW$Experiment + 100 # relabeled Experiment
with(RAW, P1-tapply(P1,Experiment,mean)[Experiment]) # good
  2  2  2  1  1  1
 -2  0  2  1 -1  0
with(RAW, P1-tapply(P1,E,mean)[E]) # bad
 <NA> <NA> <NA> <NA> <NA> <NA>
   NA   NA   NA   NA   NA   NA

with(RAW, ave(P1, E, FUN=function(x) scale(x,  scale=FALSE) ) )
# [1] -2  0  2  1 -1  0   good


RAW$E <- factor(RAW$E) # convert to factor
with(RAW, P1-tapply(P1,E,mean)[E]) # good
 102 102 102 101 101 101
  -2   0   2   1  -1   0

And take note that Bill made his variable a factor outside the tapply environment. If he had just used it in the tapply function (as I often do ...possibly unwisely in light of this gotcha) it would fail:

> with(RAW, P1-tapply(P1, factor(E), mean)[E])
<NA> <NA> <NA> <NA> <NA> <NA>
  NA   NA   NA   NA   NA   NA

... that is unless you also use factor(E) as the index:

> with(RAW, P1-tapply(P1, factor(E), mean)[factor(E)])
102 102 102 101 101 101
 -2   0   2   1  -1   0

Thanks. Bill. I've learned a lot of R from you.

--
David.


Another way to approach the problem is to think of
your normalized data as the residuals from a linear model:
residuals(lm(data=RAW, cbind(P1,P2) ~ E))
              P1            P2
 1 -2.000000e+00 -4.000000e+00
 2  4.385598e-17  8.771196e-17
 3  2.000000e+00  4.000000e+00
 4  1.000000e+00 -1.000000e+00
 5 -1.000000e+00  8.771196e-17
 6  4.385598e-17  1.000000e+00
zapsmall(.Last.value) # make reading easier
   P1 P2
 1 -2 -4
 2  0  0
 3  2  4
 4  1 -1
 5 -1  0
 6  0  1
That approach can make generizations to more factors
or to smoothing approaches easier.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


HTH,
Daniel


ronny wrote:

Hi,

I would like to center P1 and P2 of the following data frame by the factor
"Experiment", i.e. substruct from each value the average of its
experiment, and keep the original data structure, i.e. the experiment and
the group of each value.

RAW=

data .frame ("Experiment "= c (2,2,2,1,1,1 ),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=
c(8,12,16,2,3,4))

Desired result:

NORMALIZED=
data .frame ("Experiment "= c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-
1,0),"P2"=c(-4,0,4,-1,0,1))

I tried using "by", but then I lose the original order, and the "Group"
varaible. Can you help?

RAW
 Experiment Group P1 P2
        2     A 10  8
        2     A 12 12
        2     B 14 16
        1     A  5  2
        1     A  3  3
        1     B  4  4

NOT.OK<- within (RAW,
{P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})

NOT.OK
 Experiment Group P1 P2
         2     A  1  8
         2     A -1 12
         2     B  0 16
         1     A -2  2
         1     A  0  3
         1     B  2  4


--
View this message in context: 
http://r.789695.n4.nabble.com/Centering-data-frame-by-factor-
tp3677609p3677620.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to