OK, I admit it: I re-read what you wrote and now I'm confused. Is: > sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))
X1 X2 X3 X4 X5 X6 X7 X8 [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2 [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2 [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4 [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2 what you want? -- Bert On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <bgun...@gene.com> wrote: > The OP's request is a bit ambiguous to me: at a given residue, do you > wish to calculate the proportions for only those amino acids that > appear at that residue, or do you wish to include the proportions for > all amino acids, some of which might then be 0. > > Assuming the former, then I don't think one needs to go to the lengths > described by John below. > > Using your example (thanks!), the following seems to suffice: > >> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x))) > > $X1 > x > L R T > 0.50 0.25 0.25 > > $X2 > x > E M > 0.75 0.25 > > $X3 > x > N Y > 0.25 0.75 > > $X4 > x > I L Q > 0.25 0.50 0.25 > > $X5 > x > I V > 0.75 0.25 > > $X6 > x > P S > 0.75 0.25 > > $X7 > x > D E G > 0.25 0.50 0.25 > > $X8 > x > A C > 0.75 0.25 > > > This could, of course, then be modified to add zero proportions for > all non-appearing amino acids. > > -- Cheers, > Bert > > On Tue, Jul 24, 2012 at 8:18 AM, John Kane <jrkrid...@inbox.com> wrote: >> >> I think this does what you want using two packages, plyr and reshape2 that >> you may have to install. If so install.packages("plyr", "reshape2") >> should >> do the trick. >> library(plyr) >> library(reshape2) >> # using supplied file 'myfile" from below >> time0total = sum(myfile[,2]) >> mydata <- myfile[, 2:10] >> md1 <- melt(mydata, id = "Time_zero") >> ddply(md1, .(variable, value), summarise, sum = sum(Time_zero)/time0total) >> >> >> John Kane >> Kingston ON Canada >> >> -----Original Message----- >> From: z...@cornell.edu >> Sent: Tue, 24 Jul 2012 10:25:21 -0400 >> To: jrkrid...@inbox.com >> Subject: Re: [R] How to do the same thing for all levels of a column? >> >> Hi John, >> Thank you for the tips. My apologies about the unreadable sample data... >> So here is the output of the sample data, and hopefully it works this time >> :) >> myfile <- structure(list(Proteins = structure(1:4, .Label = c("p1", >> "p2", >> "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, >> 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", >> "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L >> ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, >> 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, >> 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = >> structure(c(1L, >> 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, >> 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, >> 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = >> structure(c(1L, >> 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = >> c("Proteins", >> "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = >> c(NA, >> 4L), class = "data.frame") >> And here is my original question: >> Basically, I have a bunch of protein sequences composed of different amino >> acid residues, and each residue is represented by an uppercase letter. I >> want to calculate the ratio of different amino acid residues at each >> position of the proteins. >> >> If I name this table as myfile.txt, I have the following scripts to >> calculate the ratio of each amino acid residue at position 1: >> >> # showing levels of the 3rd column, which means the types of residues >> >> >myfile[,3] >> >> >> # calculating the ratio of L >> >> >list=c(which(myfile[,3]=="L")) >> >> >time0total=sum(myfile[,2]) >> >> >AA_L=0 >> >> >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >> >> >ratio_L=AA_L/time0total >> >> >> So how can I write a script to do the same thing for the other two levels >> (T >> and R) in column 3, and also do this for every column that contains amino >> acid residues? >> >> Thanks a lot! >> >> Regards, >> >> Zhao >> 2012/7/24 John Kane <[1]jrkrid...@inbox.com> >> >> First thing is to supply the data in a useable format. As is it is >> essenatially unreadable. All R-beginners do this. :) >> Have a look at the dput function (?dput) for a good way to supply >> sample >> data in an email. >> If you have a large dataset probably a few dozen lines of data would be >> fine. >> Something like dput(head(mydata)) should be fine. Just copy and paste >> the >> output into your email. >> Welcome to R. I think you will like it. >> John Kane >> Kingston ON Canada >> >> > -----Original Message----- >> > From: [2]z...@cornell.edu >> > Sent: Mon, 23 Jul 2012 18:01:11 -0400 >> > To: [3]r-help@r-project.org >> > Subject: [R] How to do the same thing for all levels of a column? >> > >> > Dear all, >> > >> > >> > >> > I am a R beginner, and I am looking for a way to do the same thing for >> > all >> > levels of a column in a table. >> > >> > >> > >> > Basically, I have a bunch of protein sequences composed of different >> > amino >> > acid residues, and each residue is represented by an uppercase letter. I >> > want to calculate the ratio of different amino acid residues at each >> > position of the proteins. Here is an example table: >> > >> > Proteins >> > >> > Time_zero >> > >> > 1 >> > >> > 2 >> > >> > 3 >> > >> > 4 >> > >> > 5 >> > >> > 6 >> > >> > 7 >> > >> > 8 >> > >> > p1 >> > >> > 0.0050723 >> > >> > L >> > >> > E >> > >> > Y >> > >> > I >> > >> > I >> > >> > P >> > >> > D >> > >> > A >> > >> > p2 >> > >> > 0.0002731 >> > >> > T >> > >> > E >> > >> > N >> > >> > L >> > >> > V >> > >> > P >> > >> > G >> > >> > A >> > >> > p3 >> > >> > 9.757E-05 >> > >> > L >> > >> > M >> > >> > Y >> > >> > Q >> > >> > I >> > >> > P >> > >> > E >> > >> > C >> > >> > p4 >> > >> > 0.0002077 >> > >> > R >> > >> > E >> > >> > Y >> > >> > L >> > >> > I >> > >> > S >> > >> > E >> > >> > A >> > >> > >> > >> > If I name this table as myfile.txt, I have the following scripts to >> > calculate the ratio of each amino acid residue at position 1: >> > >> > # showing levels of the 3rd column, which means the types of residues >> > >> > >myfile[,3] >> > >> > >> > >> > # calculating the ratio of L >> > >> > >list=c(which(myfile[,3]=="L")) >> > >> > >time0total=sum(myfile[,2]) >> > >> > >AA_L=0 >> > >> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >> > >> > >ratio_L=AA_L/time0total >> > >> > >> > >> > So how can I write a script to do the same thing for the other two >> levels >> > (T and R) in column 3, and also do this for every column that contains >> > amino acid residues? >> > >> > >> > >> > Many thanks for any help you could give me on this topic! :) >> > >> > >> > >> > Regards, >> > >> > Zhao >> > -- >> > Zhao JIN >> > Ph.D. Candidate >> > Ruth Ley Lab >> > 467 Biotech >> > Field of Microbiology, Cornell University >> > Lab: 607.255.4954 >> > Cell: 412.889.3675 >> > >> >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > [4]R-help@r-project.org mailing list >> > [5]https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > [6]http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> ____________________________________________________________ >> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on >> your desktop! >> Check it out at [7]http://www.inbox.com/marineaquarium >> >> -- >> Zhao JIN >> Ph.D. Candidate >> Ruth Ley Lab >> 467 Biotech >> Field of Microbiology, Cornell University >> Lab: 607.255.4954 >> Cell: 412.889.3675 >> _________________________________________________________________ >> >> [8]3D Earth Screensaver Preview >> Free 3D Earth Screensaver >> Watch the Earth right on your desktop! Check it out at >> [9]www.inbox.com/earth >> >> References >> >> 1. mailto:jrkrid...@inbox.com >> 2. mailto:z...@cornell.edu >> 3. mailto:r-help@r-project.org >> 4. mailto:R-help@r-project.org >> 5. https://stat.ethz.ch/mailman/listinfo/r-help >> 6. http://www.R-project.org/posting-guide.html >> 7. http://www.inbox.com/marineaquarium >> 8. http://www.inbox.com/earth >> 9. http://www.inbox.com/earth >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.