Try this (after making sure that Col_1 in data2 matches your column
names in data1
> data1 <- read.table(textConnection("Taxon stage1 stage2 stage3 stage4
+ T1 0 0 1 1
+ T2 0 1 1 0
+ T3 0 0 0 1
+ T4 1 0 0 0"), header=TRUE)
> data2 <- read.table(textConnection("Col_1 Col_2
+ stage1 Group1
+ stage2 Group1
+ stage3 Group2
+ stage4 Group2"), header=TRUE, as.is=TRUE)
> closeAllConnections()
> # get the columns to summarize by
> colSumz <- split(data2$Col_1, data2$Col_2)
> # create the output matrix
> result <- matrix(0, nrow=nrow(data1), ncol=length(colSumz))
> colnames(result) <- names(colSumz)
> rownames(result) <- data1$Taxon
> for (i in names(colSumz)){
+ result[, i] <- rowSums(data1[, colSumz[[i]]])
+ }
> result
Group1 Group2
T1 0 2
T2 1 1
T3 0 1
T4 1 0
>
On Mon, Sep 6, 2010 at 1:49 PM, Martin Hughes <[email protected]> wrote:
>
> Hi
>
> This question is far less simple than the title suggests, please read
> carefully, thanks.
>
> I have 2 sets of data, both read into R
>
>>data1<-read.table ("1.txt", header=T, sep="\t")
>>data2<-read.table ("2.txt", header=T, sep="\t")
>
>>data1
>
> Taxon stage1 stage2 stage3 stage4
> T1 0 0 1 1
> T2 0 1 1 0
> T3 0 0 0 1
> T4 1 0 0 0
>
>
>>data2 # this is a library file, it contains all possible values of stage
>>(Col_1) that may be contained in the data1 file (headers of each column), and
>>what they correspond to
> # in the Col_2 ie stages 1:2 == Group1
>
> Col_1 Col_2
> Stage1 Group1
> Stage2 Group1
> Stage3 Group2
> Stage4 Group2
>
> I want to get R to combine the columns in data1 based on the information in
> data2 (Col_2), eg in this instance reduce the columns in data1 from 4 to 2,
> summing up the
> values within each column of data1 to get the result below
>
> Taxon group1 group2
>
> T1 0 1
>
> T2 1 1
>
> T3 0 1
>
> T4 1 0
>
> i have many datasets which have different numbers of stage eg one dataset
> will have stage1-10, another will have stage15-35 (data2, Col_2 has all
> possilbe stage values so will say what group they correspond to)
>
> so far i can isolate the rows of data2 which contains the stages in data1
> with this:
>
>> data1.names<-names(data1[,-1]) #take the header names
>> from data1 minus the 1st column (this is not found in the data2 library file)
>> row.numbers<-match(data1.names, data2[,1]) #match the vector containing
>> the data1 column header names to those found in the library file of data2
>> data2.small<-data2[row.numbers] #reduce the data2 to
>> only include the same stages as found in the data1 file
>
> from here on i dont know what to, really i wanted to just be able to change
> the header names of data1 to their corresponding name that is found in Col_2
> and then use some statement that could merge columns in data1 which were the
> same (and also sum the values at each row and dividing by their value if they
> were greater than 1 (so i only have 0 or 1 again) but i dont know how to do
> that.
>
> Can someone help me to get the desired result (as in the example above) that
> doe not require me to manually merge columns? ie get the example output in an
> automated way that could take any version of the data1 file (ie with
> different stage values) and using the data2 file (library file - same in each
> instance) get the output similar as in the example above?
>
>
> Thanks
>
> Martin
>
>
>
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.