Re: [R] Fw: calculate mean of multiple rows in a data frame

Jabez Wilson Fri, 02 Dec 2011 14:19:37 -0800

Thank you, I copied the data from the R environment, but it came out wrong. You 
understood exactly what I wanted, and your solution is admirable: I clearly 
need to address the naming convention. Thanks for your help.


--- On Fri, 2/12/11, Jean V Adams <jvad...@usgs.gov> wrote:


From: Jean V Adams <jvad...@usgs.gov>
Subject: Re: [R] Fw: calculate mean of multiple rows in a data frame
To: "Jabez Wilson" <jabez...@yahoo.co.uk>
Cc: "R-Help" <r-h...@stat.math.ethz.ch>
Date: Friday, 2 December, 2011, 14:29



It's easier for folks to help you if you put your example data in a format that 
can be readily read in R.  See, for example, the dput() function, which you can 
use to provide us with something like this: 

DF <- structure(list(NAME = c("Control_1", "Control_2", "Control_1", 
"Control_3", "MM0289~RFU:11810.15", "MM0289~RFU:9238.41", 
"MM16597~RFU:36765.38", 
"MM16597~RFU:41258.94"), ID = c("probe~B01R01C01", "probe~B01R01C02", 
"probe~B01R09C01", "probe~B01R09C02", "probe~B29R13C06", "probe~B29R13C05", 
"probe~B44R15C20", "probe~B44R15C19"), a = c(3L, 712L, 937L, 
464L, 99L, 605L, 700L, 132L), b = c(22L, 13L, 824L, 836L, 544L, 
603L, 923L, 777L), c = c(926L, 32L, 898L, 508L, 607L, 862L, 219L, 
497L), d = c(774L, 179L, 668L, 53L, 984L, 575L, 582L, 995L)), .Names = 
c("NAME", 
"ID", "a", "b", "c", "d"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8")) 

If I understand what you're after, you want to summarize data within groups, 
but your NAME variable is not as general as you would like.  You can get around 
this by creating a new variable which is a shorter and more general version of 
the NAME variable.  I did this by saving just the part of the NAME before the 
colon, ":". 

shortname <- sapply(strsplit(DF$NAME, ":"), "[", 1) 
aggregate(DF[, -(1:2)], by=list(shortname=shortname), mean) 

    shortname   a     b     c     d 
1   Control_1 470 423.0 912.0 721.0 
2   Control_2 712  13.0  32.0 179.0 
3   Control_3 464 836.0 508.0  53.0 
4  MM0289~RFU 352 573.5 734.5 779.5 
5 MM16597~RFU 416 850.0 358.0 788.5 

Jean 


> Jabez Wilson wrote on 12/01/2011 03:15:39 PM:

> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686
> 
> 
> 
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686
> Sorry, that should look like this:
> 
> 
> 
> 
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686 NAME ID a b c d 
> 1 Control_1 probe~B01R01C01 3 22 926 774 
> 2 Control_2 probe~B01R01C02 712 13 32 179 
> 3 Control_1 probe~B01R09C01 937 824 898 668 
> 4 Control_3 probe~B01R09C02 464 836 508 53 
> 5 MM0289~RFU:11810.15 probe~B29R13C06 99 544 607 984 
> 6 MM0289~RFU:9238.41 probe~B29R13C05 605 603 862 575 
> 7 MM16597~RFU:36765.38 probe~B44R15C20 700 923 219 582 
> 8 MM16597~RFU:41258.94 probe~B44R15C19 132 777 497 995
> 
> --- On Thu, 1/12/11, Jabez Wilson <jabez...@yahoo.co.uk> wrote:
> 
> 
> From: Jabez Wilson <jabez...@yahoo.co.uk>
> Subject: calculate mean of multiple rows in a data frame
> To: "R-Help" <r-h...@stat.math.ethz.ch>
> Date: Thursday, 1 December, 2011, 20:45
> 
> Dear all, I have a data frame (DF) in the following format:
> 
> NAME
> ID
> a
> b
> c
> d
> 
> 1
> Control_1
> probe~B01R01C01
> 381
> 213
> 345
> 653
> 
> 2
> Control_2
> probe~B01R01C02
> 574
> 629
> 563
> 783
> 
> 3
> Control_1
> probe~B01R09C01
> 673
> 511
> 521
> 967
> 
> 4
> Control_3
> probe~B01R09C02
> 53
> 809
> 999
> 50
> 
> 5
> MM0289~RFU:11810.15
> probe~B29R13C06
> 681
> 34
> 115
> 587
> 
> 6
> MM0289~RFU:9238.41
> probe~B29R13C05
> 784
> 443
> 20
> 784
> 
> 7
> MM16597~RFU:36765.38
> probe~B44R15C20
> 719
> 251
> 790
> 445
> 
> 8
> MM16597~RFU:41258.94
> probe~B44R15C19
> 677
> 363
> 268
> 686.....
> I would like to consolidate the data frame by parsing through the 
> rows, and where the NAME is identical, consolidate into one row and 
> return the mean.
> I can do this for the first lines (Control_1 etc) by using aggregate()
> aggregate(DF[,-c(1:2)], by=list(DF$NAME), mean)
> but since aggregate looks for unique lines it won't consolidate e.g.
> lines 5/6 and 7/8.
> Is there a way of telling aggregate to grep just the first part of 
> the name (i.e. up to "~") and consolidate those?
> I could pre-grep the file before importing into R, but I'd like to 
> do it within R if possible.
> Thanks for any suggestions

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fw: calculate mean of multiple rows in a data frame

Reply via email to