On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:

Hello,

I'm trying to create a custom function that "mean-centers" data and can be
applied across many columns.

Here is an example dataset, which is similar to my dataset:


dat <- read.table(text="Location,TimePeriod,Units,AveragePrice
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2", header=TRUE, sep=",")

I want to mean-center the "Units" and "AveragePrice" Columns.

So, I created this function:

specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }

I needed to modify this to avoid errors relating to how colMeans is expecting its arguments:

specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }

aggregate(dat[3:4], dat[1], FUN=specialFunction2)

Location Units.1 Units.2 Units.3 AveragePrice.1 AveragePrice.2 1 Los Angeles 0.2136827 -0.0053709 -0.2083118 0.0717903 -0.0728730 2 New York 0.2354659 -0.0902535 -0.1452124 0.1014743 -0.0871168 3 Paris 0.2193320 -0.0487031 -0.1706289 0.1173316 -0.0491417
  AveragePrice.3
1      0.0010827
2     -0.0143575
3     -0.0681899


If I use only "one" column in the first argument of the "by" function,
everything is in fine.  For example the following code will work fine:

by(data[c("Units")],
data["Location"],
specialFunction)

But the following code will "not" work, because I have "two" columns in the
first argument...

by(data[c("Units", "AveragePrice")],
data["Location"],
specialFunction)

OK. So then I tried this with your function and was surprised to see that it also works:

> by(dat[c("Units", "AveragePrice")],
+ dat["Location"],
+ specialFunction)
Location: Los Angeles
     Units AveragePrice
1  0.21368    0.0717903
2  2.27351   -2.3517586
3 -0.20831    0.0010827
------------------------------------------------------------------
Location: New York
     Units AveragePrice
4  0.23547     0.101474
5  3.47628    -3.653655
6 -0.14521    -0.014357
------------------------------------------------------------------
Location: Paris
     Units AveragePrice
7  0.21933      0.11733
8  4.52537     -4.62322
9 -0.17063     -0.06819


Does anyone have any ideas as to what I am doing wrong?

I guess I don't. Cannot reproduce and my other methods worked as well.This also works with your version and with mine but I get the deprecation message for `mean.data.frame` from mine:

> lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
$`Los Angeles`
     Units AveragePrice
1  0.21368    0.0717903
2  2.27351   -2.3517586
3 -0.20831    0.0010827

$`New York`
     Units AveragePrice
4  0.23547     0.101474
5  3.47628    -3.653655
6 -0.14521    -0.014357

$Paris
     Units AveragePrice
7  0.21933      0.11733
8  4.52537     -4.62322
9 -0.17063     -0.06819


Please note that I'm trying to get the following results (for the "Los
Angeles" group):

Los Angeles "Units" variable (Mean-Centered)
0.213682659
-0.005370907
-0.208311751

Los Angeles "AveragePrice" variable (Mean-Centered)
0.071790268
-0.072872965
0.001082696

--

David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to