... and I neglected to mention that f = myfiles[,2] Sigh.... More coffee needed.
-- Bert On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <bgun...@gene.com> wrote: > Sorry. Typo in my previous. Should be: > >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum))) > $X1 > L R T > 0.91491320 0.03675651 0.04833030 > > $X2 > E M > 0.9827278 0.0172722 > > $X3 > N Y > 0.0483303 0.9516697 > > $X4 > I L Q > 0.8976410 0.0850868 0.0172722 > > $X5 > I V > 0.9516697 0.0483303 > > $X6 > P S > 0.96324349 0.03675651 > > $X7 > D E G > 0.8976410 0.0540287 0.0483303 > > $X8 > A C > 0.9827278 0.0172722 > > > > On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <bgun...@gene.com> wrote: >> OK, I admit it: I re-read what you wrote and now I'm confused. Is: >> >>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x))) >> >> X1 X2 X3 X4 X5 X6 X7 X8 >> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 >> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 >> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 >> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 >> >> what you want? >> >> -- Bert >> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <bgun...@gene.com> wrote: >>> The OP's request is a bit ambiguous to me: at a given residue, do you >>> wish to calculate the proportions for only those amino acids that >>> appear at that residue, or do you wish to include the proportions for >>> all amino acids, some of which might then be 0. >>> >>> Assuming the former, then I don't think one needs to go to the lengths >>> described by John below. >>> >>> Using your example (thanks!), the following seems to suffice: >>> >>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) >>> >>> $X1 >>> x >>> L R T >>> 0.50 0.25 0.25 >>> >>> $X2 >>> x >>> E M >>> 0.75 0.25 >>> >>> $X3 >>> x >>> N Y >>> 0.25 0.75 >>> >>> $X4 >>> x >>> I L Q >>> 0.25 0.50 0.25 >>> >>> $X5 >>> x >>> I V >>> 0.75 0.25 >>> >>> $X6 >>> x >>> P S >>> 0.75 0.25 >>> >>> $X7 >>> x >>> D E G >>> 0.25 0.50 0.25 >>> >>> $X8 >>> x >>> A C >>> 0.75 0.25 >>> >>> >>> This could, of course, then be modified to add zero proportions for >>> all non-appearing amino acids. >>> >>> -- Cheers, >>> Bert >>> >>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <jrkrid...@inbox.com> wrote: >>>> >>>> I think this does what you want using two packages, plyr and reshape2 >>>> that >>>> you may have to install. If so install.packages("plyr", "reshape2") >>>> should >>>> do the trick. >>>> library(plyr) >>>> library(reshape2) >>>> # using supplied file 'myfile" from below >>>> time0total = sum(myfile[,2]) >>>> mydata <- myfile[, 2:10] >>>> md1 <- melt(mydata, id = "Time_zero") >>>> ddply(md1, .(variable, value), summarise, sum = >>>> sum(Time_zero)/time0total) >>>> >>>> >>>> John Kane >>>> Kingston ON Canada >>>> >>>> -----Original Message----- >>>> From: z...@cornell.edu >>>> Sent: Tue, 24 Jul 2012 10:25:21 -0400 >>>> To: jrkrid...@inbox.com >>>> Subject: Re: [R] How to do the same thing for all levels of a column? >>>> >>>> Hi John, >>>> Thank you for the tips. My apologies about the unreadable sample data... >>>> So here is the output of the sample data, and hopefully it works this >>>> time >>>> :) >>>> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", >>>> "p2", >>>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, >>>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", >>>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L >>>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, >>>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = >>>> structure(c(1L, >>>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = >>>> structure(c(1L, >>>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = >>>> structure(c(1L, >>>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = >>>> structure(c(1L, >>>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = >>>> structure(c(1L, >>>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = >>>> c("Proteins", >>>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names >>>> = >>>> c(NA, >>>> 4L), class = "data.frame") >>>> And here is my original question: >>>> Basically, I have a bunch of protein sequences composed of different >>>> amino >>>> acid residues, and each residue is represented by an uppercase letter. I >>>> want to calculate the ratio of different amino acid residues at each >>>> position of the proteins. >>>> >>>> If I name this table as myfile.txt, I have the following scripts to >>>> calculate the ratio of each amino acid residue at position 1: >>>> >>>> # showing levels of the 3rd column, which means the types of residues >>>> >>>> >myfile[,3] >>>> >>>> >>>> # calculating the ratio of L >>>> >>>> >list=c(which(myfile[,3]=="L")) >>>> >>>> >time0total=sum(myfile[,2]) >>>> >>>> >AA_L=0 >>>> >>>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>>> >>>> >ratio_L=AA_L/time0total >>>> >>>> >>>> So how can I write a script to do the same thing for the other two >>>> levels (T >>>> and R) in column 3, and also do this for every column that contains >>>> amino >>>> acid residues? >>>> >>>> Thanks a lot! >>>> >>>> Regards, >>>> >>>> Zhao >>>> 2012/7/24 John Kane <[1]jrkrid...@inbox.com> >>>> >>>> First thing is to supply the data in a useable format. As is it is >>>> essenatially unreadable. All R-beginners do this. :) >>>> Have a look at the dput function (?dput) for a good way to supply >>>> sample >>>> data in an email. >>>> If you have a large dataset probably a few dozen lines of data would >>>> be >>>> fine. >>>> Something like dput(head(mydata)) should be fine. Just copy and >>>> paste the >>>> output into your email. >>>> Welcome to R. I think you will like it. >>>> John Kane >>>> Kingston ON Canada >>>> >>>> > -----Original Message----- >>>> > From: [2]z...@cornell.edu >>>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 >>>> > To: [3]r-help@r-project.org >>>> > Subject: [R] How to do the same thing for all levels of a column? >>>> > >>>> > Dear all, >>>> > >>>> > >>>> > >>>> > I am a R beginner, and I am looking for a way to do the same thing for >>>> > all >>>> > levels of a column in a table. >>>> > >>>> > >>>> > >>>> > Basically, I have a bunch of protein sequences composed of different >>>> > amino >>>> > acid residues, and each residue is represented by an uppercase >>>> letter. I >>>> > want to calculate the ratio of different amino acid residues at each >>>> > position of the proteins. Here is an example table: >>>> > >>>> > Proteins >>>> > >>>> > Time_zero >>>> > >>>> > 1 >>>> > >>>> > 2 >>>> > >>>> > 3 >>>> > >>>> > 4 >>>> > >>>> > 5 >>>> > >>>> > 6 >>>> > >>>> > 7 >>>> > >>>> > 8 >>>> > >>>> > p1 >>>> > >>>> > 0.0050723 >>>> > >>>> > L >>>> > >>>> > E >>>> > >>>> > Y >>>> > >>>> > I >>>> > >>>> > I >>>> > >>>> > P >>>> > >>>> > D >>>> > >>>> > A >>>> > >>>> > p2 >>>> > >>>> > 0.0002731 >>>> > >>>> > T >>>> > >>>> > E >>>> > >>>> > N >>>> > >>>> > L >>>> > >>>> > V >>>> > >>>> > P >>>> > >>>> > G >>>> > >>>> > A >>>> > >>>> > p3 >>>> > >>>> > 9.757E-05 >>>> > >>>> > L >>>> > >>>> > M >>>> > >>>> > Y >>>> > >>>> > Q >>>> > >>>> > I >>>> > >>>> > P >>>> > >>>> > E >>>> > >>>> > C >>>> > >>>> > p4 >>>> > >>>> > 0.0002077 >>>> > >>>> > R >>>> > >>>> > E >>>> > >>>> > Y >>>> > >>>> > L >>>> > >>>> > I >>>> > >>>> > S >>>> > >>>> > E >>>> > >>>> > A >>>> > >>>> > >>>> > >>>> > If I name this table as myfile.txt, I have the following scripts to >>>> > calculate the ratio of each amino acid residue at position 1: >>>> > >>>> > # showing levels of the 3rd column, which means the types of residues >>>> > >>>> > >myfile[,3] >>>> > >>>> > >>>> > >>>> > # calculating the ratio of L >>>> > >>>> > >list=c(which(myfile[,3]=="L")) >>>> > >>>> > >time0total=sum(myfile[,2]) >>>> > >>>> > >AA_L=0 >>>> > >>>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>>> > >>>> > >ratio_L=AA_L/time0total >>>> > >>>> > >>>> > >>>> > So how can I write a script to do the same thing for the other two >>>> levels >>>> > (T and R) in column 3, and also do this for every column that contains >>>> > amino acid residues? >>>> > >>>> > >>>> > >>>> > Many thanks for any help you could give me on this topic! :) >>>> > >>>> > >>>> > >>>> > Regards, >>>> > >>>> > Zhao >>>> > -- >>>> > Zhao JIN >>>> > Ph.D. Candidate >>>> > Ruth Ley Lab >>>> > 467 Biotech >>>> > Field of Microbiology, Cornell University >>>> > Lab: 607.255.4954 >>>> > Cell: 412.889.3675 >>>> > >>>> >>>> > [[alternative HTML version deleted]] >>>> > >>>> > ______________________________________________ >>>> > [4]R-help@r-project.org mailing list >>>> > [5]https://stat.ethz.ch/mailman/listinfo/r-help >>>> > PLEASE do read the posting guide >>>> > [6]http://www.R-project.org/posting-guide.html >>>> > and provide commented, minimal, self-contained, reproducible code. >>>> ____________________________________________________________ >>>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas >>>> on >>>> your desktop! >>>> Check it out at [7]http://www.inbox.com/marineaquarium >>>> >>>> -- >>>> Zhao JIN >>>> Ph.D. Candidate >>>> Ruth Ley Lab >>>> 467 Biotech >>>> Field of Microbiology, Cornell University >>>> Lab: 607.255.4954 >>>> Cell: 412.889.3675 >>>> _________________________________________________________________ >>>> >>>> [8]3D Earth Screensaver Preview >>>> Free 3D Earth Screensaver >>>> Watch the Earth right on your desktop! Check it out at >>>> [9]www.inbox.com/earth >>>> >>>> References >>>> >>>> 1. mailto:jrkrid...@inbox.com >>>> 2. mailto:z...@cornell.edu >>>> 3. mailto:r-help@r-project.org >>>> 4. mailto:R-help@r-project.org >>>> 5. https://stat.ethz.ch/mailman/listinfo/r-help >>>> 6. http://www.R-project.org/posting-guide.html >>>> 7. http://www.inbox.com/marineaquarium >>>> 8. http://www.inbox.com/earth >>>> 9. http://www.inbox.com/earth >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> >>> Bert Gunter >>> Genentech Nonclinical Biostatistics >>> >>> Internal Contact Info: >>> Phone: 467-7374 >>> Website: >>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.