Re: [R] combining collumns for data.frames

jim holtman Mon, 06 Sep 2010 15:18:21 -0700

Try this (after making sure that Col_1 in data2 matches your column
names in data1


> data1 <- read.table(textConnection("Taxon   stage1   stage2   stage3   stage4
+ T1          0          0          1          1
+ T2          0          1          1          0
+ T3          0          0          0          1
+ T4          1          0          0          0"), header=TRUE)
> data2 <- read.table(textConnection("Col_1        Col_2
+ stage1      Group1
+ stage2      Group1
+ stage3      Group2
+ stage4      Group2"), header=TRUE, as.is=TRUE)
> closeAllConnections()
> # get the columns to summarize by
> colSumz <- split(data2$Col_1, data2$Col_2)
> # create the output matrix
> result <- matrix(0, nrow=nrow(data1), ncol=length(colSumz))
> colnames(result) <- names(colSumz)
> rownames(result) <- data1$Taxon
> for (i in names(colSumz)){
+     result[, i] <- rowSums(data1[, colSumz[[i]]])
+ }
> result
   Group1 Group2
T1      0      2
T2      1      1
T3      0      1
T4      1      0
>


On Mon, Sep 6, 2010 at 1:49 PM, Martin Hughes <[email protected]> wrote:
>
> Hi
>
> This question is far less simple than the title suggests, please read 
> carefully, thanks.
>
> I have 2 sets of data, both read into R
>
>>data1<-read.table ("1.txt", header=T, sep="\t")
>>data2<-read.table ("2.txt", header=T, sep="\t")
>
>>data1
>
> Taxon   stage1   stage2   stage3   stage4
> T1          0          0          1          1
> T2          0          1          1          0
> T3          0          0          0          1
> T4          1          0          0          0
>
>
>>data2 # this is a library file, it contains all possible values of stage 
>>(Col_1) that may be contained in the data1 file (headers of each column), and 
>>what they correspond to
>           # in the Col_2 ie stages 1:2 == Group1
>
> Col_1        Col_2
> Stage1      Group1
> Stage2      Group1
> Stage3      Group2
> Stage4      Group2
>
>  I want to get R to combine the columns in data1 based on the information in 
> data2 (Col_2), eg in this instance reduce the columns in data1 from 4 to 2, 
> summing up the
>  values within each column of data1 to get the result below
>
> Taxon   group1   group2
>
> T1          0          1
>
> T2          1          1
>
> T3          0          1
>
> T4          1          0
>
> i have many datasets which have different numbers of stage eg one dataset 
> will have stage1-10, another will have stage15-35 (data2, Col_2 has all 
> possilbe stage values so will say what group they correspond to)
>
> so far i can isolate the rows of data2 which contains the stages in data1 
> with this:
>
>> data1.names<-names(data1[,-1])                        #take the header names 
>> from data1 minus the 1st column (this is not found in the data2 library file)
>> row.numbers<-match(data1.names, data2[,1])     #match the vector containing 
>> the data1 column header names to those found in the library file of data2
>> data2.small<-data2[row.numbers]                       #reduce the data2 to 
>> only include the same stages as found in the data1 file
>
>  from here on i dont know what to, really i wanted to just be able to change 
> the header names of data1 to their corresponding name that is found in Col_2 
> and then use some statement that could merge columns in data1 which were the 
> same (and also sum the values at each row and dividing by their value if they 
> were greater than 1 (so i only have 0 or 1 again) but i dont know how to do 
> that.
>
> Can someone help me to get the desired result  (as in the example above) that 
> doe not require me to manually merge columns? ie get the example output in an 
> automated way that could take any version of the data1 file (ie with 
> different stage values) and using the data2 file (library file - same in each 
> instance) get the output similar as in the example above?
>
>
> Thanks
>
> Martin
>
>
>
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] combining collumns for data.frames

Reply via email to