No it's actually telling it to split by the two variables (variable, value) if I understand your question correctly. The confusion is my fault. I tend to be lazy when running examples and did not rename the melt() output to something meaningful. I sometimes forget that it's not just me reading the code. If you run: md1 <- melt(mydata, id = "Time_zero", variable.name="xvars", value.name="aminos") ddply(md1, .(xvars, aminos), summarise, sum = sum(Time_zero)/time0total) I think it will show what is happening.
John Kane Kingston ON Canada -----Original Message----- From: z...@cornell.edu Sent: Tue, 24 Jul 2012 15:26:52 -0400 To: gunter.ber...@gene.com Subject: Re: [R] How to do the same thing for all levels of a column? Hi John and Bert, Thank you so much for your replies. Both of your scripts worked well, so now I've learnt two ways to do it. :) Bert: I was not very clear on what I wanted to do. I just would like to calculate the residues shown in the table, not all residues. The apply functions are amazing! John: as I am still digesting the codes, I am not sure if I fully understood the argument .(variables, value) in the ddply line. The description of ddply says that .variables show the variables to split data frame by, as quoted variables, a formula or character vector. So does .(variables, value) tell R to split the data frame by values, which are the types of amino acid residues? Thank you all again. Cheers, Zhao 2012/7/24 Bert Gunter <[1]gunter.ber...@gene.com> ... and I neglected to mention that f = myfiles[,2] Sigh.... More coffee needed. -- Bert On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <[2]bgun...@gene.com> wrote: > Sorry. Typo in my previous. Should be: > >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum))) > $X1 > L R T > 0.91491320 0.03675651 0.04833030 > > $X2 > E M > 0.9827278 0.0172722 > > $X3 > N Y > 0.0483303 0.9516697 > > $X4 > I L Q > 0.8976410 0.0850868 0.0172722 > > $X5 > I V > 0.9516697 0.0483303 > > $X6 > P S > 0.96324349 0.03675651 > > $X7 > D E G > 0.8976410 0.0540287 0.0483303 > > $X8 > A C > 0.9827278 0.0172722 > > > > On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <[3]bgun...@gene.com> wrote: >> OK, I admit it: I re-read what you wrote and now I'm confused. Is: >> >>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x))) >> >> X1 X2 X3 X4 X5 X6 X7 X8 >> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 >> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 >> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 >> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 >> >> what you want? >> >> -- Bert >> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <[4]bgun...@gene.com> wrote: >>> The OP's request is a bit ambiguous to me: at a given residue, do you >>> wish to calculate the proportions for only those amino acids that >>> appear at that residue, or do you wish to include the proportions for >>> all amino acids, some of which might then be 0. >>> >>> Assuming the former, then I don't think one needs to go to the lengths >>> described by John below. >>> >>> Using your example (thanks!), the following seems to suffice: >>> >>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) >>> >>> $X1 >>> x >>> L R T >>> 0.50 0.25 0.25 >>> >>> $X2 >>> x >>> E M >>> 0.75 0.25 >>> >>> $X3 >>> x >>> N Y >>> 0.25 0.75 >>> >>> $X4 >>> x >>> I L Q >>> 0.25 0.50 0.25 >>> >>> $X5 >>> x >>> I V >>> 0.75 0.25 >>> >>> $X6 >>> x >>> P S >>> 0.75 0.25 >>> >>> $X7 >>> x >>> D E G >>> 0.25 0.50 0.25 >>> >>> $X8 >>> x >>> A C >>> 0.75 0.25 >>> >>> >>> This could, of course, then be modified to add zero proportions for >>> all non-appearing amino acids. >>> >>> -- Cheers, >>> Bert >>> >>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <[5]jrkrid...@inbox.com> wrote: >>>> >>>> I think this does what you want using two packages, plyr and reshape2 that >>>> you may have to install. If so install.packages("plyr", "reshape2") should >>>> do the trick. >>>> library(plyr) >>>> library(reshape2) >>>> # using supplied file 'myfile" from below >>>> time0total = sum(myfile[,2]) >>>> mydata <- myfile[, 2:10] >>>> md1 <- melt(mydata, id = "Time_zero") >>>> ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total) >>>> >>>> >>>> John Kane >>>> Kingston ON Canada >>>> >>>> -----Original Message----- >>>> From: [6]z...@cornell.edu >>>> Sent: Tue, 24 Jul 2012 10:25:21 -0400 >>>> To: [7]jrkrid...@inbox.com >>>> Subject: Re: [R] How to do the same thing for all levels of a column? >>>> >>>> Hi John, >>>> Thank you for the tips. My apologies about the unreadable sample data... >>>> So here is the output of the sample data, and hopefully it works this time >>>> :) >>>> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", >>>> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, >>>> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", >>>> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L >>>> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, >>>> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, >>>> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = >>>> structure(c(1L, >>>> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, >>>> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, >>>> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = >>>> structure(c(1L, >>>> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = >>>> c("Proteins", >>>> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = >>>> c(NA, >>>> 4L), class = "data.frame") >>>> And here is my original question: >>>> Basically, I have a bunch of protein sequences composed of different amino >>>> acid residues, and each residue is represented by an uppercase letter. I >>>> want to calculate the ratio of different amino acid residues at each >>>> position of the proteins. >>>> >>>> If I name this table as myfile.txt, I have the following scripts to >>>> calculate the ratio of each amino acid residue at position 1: >>>> >>>> # showing levels of the 3rd column, which means the types of residues >>>> >>>> >myfile[,3] >>>> >>>> >>>> # calculating the ratio of L >>>> >>>> >list=c(which(myfile[,3]=="L")) >>>> >>>> >time0total=sum(myfile[,2]) >>>> >>>> >AA_L=0 >>>> >>>> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>>> >>>> >ratio_L=AA_L/time0total >>>> >>>> >>>> So how can I write a script to do the same thing for the other two levels (T >>>> and R) in column 3, and also do this for every column that contains amino >>>> acid residues? >>>> >>>> Thanks a lot! >>>> >>>> Regards, >>>> >>>> Zhao >>>> 2012/7/24 John Kane <[1][8]jrkrid...@inbox.com> >>>> >>>> First thing is to supply the data in a useable format. As is it is >>>> essenatially unreadable. All R-beginners do this. :) >>>> Have a look at the dput function (?dput) for a good way to supply sample >>>> data in an email. >>>> If you have a large dataset probably a few dozen lines of data would be >>>> fine. >>>> Something like dput(head(mydata)) should be fine. Just copy and paste the >>>> output into your email. >>>> Welcome to R. I think you will like it. >>>> John Kane >>>> Kingston ON Canada >>>> >>>> > -----Original Message----- >>>> > From: [2][9]z...@cornell.edu >>>> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 >>>> > To: [3][10]r-help@r-project.org >>>> > Subject: [R] How to do the same thing for all levels of a column? >>>> > >>>> > Dear all, >>>> > >>>> > >>>> > >>>> > I am a R beginner, and I am looking for a way to do the same thing for >>>> > all >>>> > levels of a column in a table. >>>> > >>>> > >>>> > >>>> > Basically, I have a bunch of protein sequences composed of different >>>> > amino >>>> > acid residues, and each residue is represented by an uppercase letter. I >>>> > want to calculate the ratio of different amino acid residues at each >>>> > position of the proteins. Here is an example table: >>>> > >>>> > Proteins >>>> > >>>> > Time_zero >>>> > >>>> > 1 >>>> > >>>> > 2 >>>> > >>>> > 3 >>>> > >>>> > 4 >>>> > >>>> > 5 >>>> > >>>> > 6 >>>> > >>>> > 7 >>>> > >>>> > 8 >>>> > >>>> > p1 >>>> > >>>> > 0.0050723 >>>> > >>>> > L >>>> > >>>> > E >>>> > >>>> > Y >>>> > >>>> > I >>>> > >>>> > I >>>> > >>>> > P >>>> > >>>> > D >>>> > >>>> > A >>>> > >>>> > p2 >>>> > >>>> > 0.0002731 >>>> > >>>> > T >>>> > >>>> > E >>>> > >>>> > N >>>> > >>>> > L >>>> > >>>> > V >>>> > >>>> > P >>>> > >>>> > G >>>> > >>>> > A >>>> > >>>> > p3 >>>> > >>>> > 9.757E-05 >>>> > >>>> > L >>>> > >>>> > M >>>> > >>>> > Y >>>> > >>>> > Q >>>> > >>>> > I >>>> > >>>> > P >>>> > >>>> > E >>>> > >>>> > C >>>> > >>>> > p4 >>>> > >>>> > 0.0002077 >>>> > >>>> > R >>>> > >>>> > E >>>> > >>>> > Y >>>> > >>>> > L >>>> > >>>> > I >>>> > >>>> > S >>>> > >>>> > E >>>> > >>>> > A >>>> > >>>> > >>>> > >>>> > If I name this table as myfile.txt, I have the following scripts to >>>> > calculate the ratio of each amino acid residue at position 1: >>>> > >>>> > # showing levels of the 3rd column, which means the types of residues >>>> > >>>> > >myfile[,3] >>>> > >>>> > >>>> > >>>> > # calculating the ratio of L >>>> > >>>> > >list=c(which(myfile[,3]=="L")) >>>> > >>>> > >time0total=sum(myfile[,2]) >>>> > >>>> > >AA_L=0 >>>> > >>>> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >>>> > >>>> > >ratio_L=AA_L/time0total >>>> > >>>> > >>>> > >>>> > So how can I write a script to do the same thing for the other two levels >>>> > (T and R) in column 3, and also do this for every column that contains >>>> > amino acid residues? >>>> > >>>> > >>>> > >>>> > Many thanks for any help you could give me on this topic! :) >>>> > >>>> > >>>> > >>>> > Regards, >>>> > >>>> > Zhao >>>> > -- >>>> > Zhao JIN >>>> > Ph.D. Candidate >>>> > Ruth Ley Lab >>>> > 467 Biotech >>>> > Field of Microbiology, Cornell University >>>> > Lab: 607.255.4954 >>>> > Cell: 412.889.3675 >>>> > >>>> >>>> > [[alternative HTML version deleted]] >>>> > >>>> > ______________________________________________ >>>> > [4][11]R-help@r-project.org mailing list >>>> > [5][12]https://stat.ethz.ch/mailman/listinfo/r-help >>>> > PLEASE do read the posting guide >>>> > [6][13]http://www.R-project.org/posting-guide.html >>>> > and provide commented, minimal, self-contained, reproducible code. >>>> ____________________________________________________________ >>>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on >>>> your desktop! >>>> Check it out at [7][14]http://www.inbox.com/marineaquarium >>>> >>>> -- >>>> Zhao JIN >>>> Ph.D. Candidate >>>> Ruth Ley Lab >>>> 467 Biotech >>>> Field of Microbiology, Cornell University >>>> Lab: 607.255.4954 >>>> Cell: 412.889.3675 >>>> _________________________________________________________________ >>>> >>>> [8]3D Earth Screensaver Preview >>>> Free 3D Earth Screensaver >>>> Watch the Earth right on your desktop! Check it out at >>>> [9][15]www.inbox.com/earth >>>> >>>> References >>>> >>>> 1. mailto:[16]jrkrid...@inbox.com >>>> 2. mailto:[17]z...@cornell.edu >>>> 3. mailto:[18]r-help@r-project.org >>>> 4. mailto:[19]R-help@r-project.org >>>> 5. [20]https://stat.ethz.ch/mailman/listinfo/r-help >>>> 6. [21]http://www.R-project.org/posting-guide.html >>>> 7. [22]http://www.inbox.com/marineaquarium >>>> 8. [23]http://www.inbox.com/earth >>>> 9. [24]http://www.inbox.com/earth >>>> ______________________________________________ >>>> [25]R-help@r-project.org mailing list >>>> [26]https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide [27]http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> >>> Bert Gunter >>> Genentech Nonclinical Biostatistics >>> >>> Internal Contact Info: >>> Phone: 467-7374 >>> Website: >>> [28]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b iostatistics/pdb-ncb-home.htm >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> [29]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b iostatistics/pdb-ncb-home.htm > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > [30]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b iostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: [31]http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-b iostatistics/pdb-ncb-home.htm -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 _________________________________________________________________ [32]3D Marine Aquarium Screensaver Preview Free 3D Marine Aquarium Screensaver Watch dolphins, sharks & orcas on your desktop! Check it out at [33]www.inbox.com/marineaquarium References 1. mailto:gunter.ber...@gene.com 2. mailto:bgun...@gene.com 3. mailto:bgun...@gene.com 4. mailto:bgun...@gene.com 5. mailto:jrkrid...@inbox.com 6. mailto:z...@cornell.edu 7. mailto:jrkrid...@inbox.com 8. mailto:jrkrid...@inbox.com 9. mailto:z...@cornell.edu 10. mailto:r-help@r-project.org 11. mailto:R-help@r-project.org 12. https://stat.ethz.ch/mailman/listinfo/r-help 13. http://www.R-project.org/posting-guide.html 14. http://www.inbox.com/marineaquarium 15. http://www.inbox.com/earth 16. mailto:jrkrid...@inbox.com 17. mailto:z...@cornell.edu 18. mailto:r-help@r-project.org 19. mailto:R-help@r-project.org 20. https://stat.ethz.ch/mailman/listinfo/r-help 21. http://www.R-project.org/posting-guide.html 22. http://www.inbox.com/marineaquarium 23. http://www.inbox.com/earth 24. http://www.inbox.com/earth 25. mailto:R-help@r-project.org 26. https://stat.ethz.ch/mailman/listinfo/r-help 27. http://www.R-project.org/posting-guide.html 28. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm 29. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm 30. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm 31. http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm 32. http://www.inbox.com/marineaquarium 33. http://www.inbox.com/marineaquarium ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.