Hi John, Thank you for the tips. My apologies about the unreadable sample data...
So here is the output of the sample data, and hopefully it works this time :) structure(list(Proteins = structure(1:4, .Label = c("p1", "p2", "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731, 9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L", "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L, 1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L, 2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 = structure(c(1L, 2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L, 1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L, 3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 = structure(c(1L, 1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names = c("Proteins", "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = c(NA, 4L), class = "data.frame") And here is my original question: Basically, I have a bunch of protein sequences composed of different amino acid residues, and each residue is represented by an uppercase letter. I want to calculate the ratio of different amino acid residues at each position of the proteins. If I name this table as myfile.txt, I have the following scripts to calculate the ratio of each amino acid residue at position 1: # showing levels of the 3rd column, which means the types of residues >myfile[,3] # calculating the ratio of L >list=c(which(myfile[,3]=="L")) >time0total=sum(myfile[,2]) >AA_L=0 >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} >ratio_L=AA_L/time0total So how can I write a script to do the same thing for the other two levels (T and R) in column 3, and also do this for every column that contains amino acid residues? Thanks a lot! Regards, Zhao 2012/7/24 John Kane <jrkrid...@inbox.com> > First thing is to supply the data in a useable format. As is it is > essenatially unreadable. All R-beginners do this. :) > > Have a look at the dput function (?dput) for a good way to supply sample > data in an email. > > If you have a large dataset probably a few dozen lines of data would be > fine. > > Something like dput(head(mydata)) should be fine. Just copy and paste the > output into your email. > > Welcome to R. I think you will like it. > > John Kane > Kingston ON Canada > > > > -----Original Message----- > > From: z...@cornell.edu > > Sent: Mon, 23 Jul 2012 18:01:11 -0400 > > To: r-help@r-project.org > > Subject: [R] How to do the same thing for all levels of a column? > > > > Dear all, > > > > > > > > I am a R beginner, and I am looking for a way to do the same thing for > > all > > levels of a column in a table. > > > > > > > > Basically, I have a bunch of protein sequences composed of different > > amino > > acid residues, and each residue is represented by an uppercase letter. I > > want to calculate the ratio of different amino acid residues at each > > position of the proteins. Here is an example table: > > > > Proteins > > > > Time_zero > > > > 1 > > > > 2 > > > > 3 > > > > 4 > > > > 5 > > > > 6 > > > > 7 > > > > 8 > > > > p1 > > > > 0.0050723 > > > > L > > > > E > > > > Y > > > > I > > > > I > > > > P > > > > D > > > > A > > > > p2 > > > > 0.0002731 > > > > T > > > > E > > > > N > > > > L > > > > V > > > > P > > > > G > > > > A > > > > p3 > > > > 9.757E-05 > > > > L > > > > M > > > > Y > > > > Q > > > > I > > > > P > > > > E > > > > C > > > > p4 > > > > 0.0002077 > > > > R > > > > E > > > > Y > > > > L > > > > I > > > > S > > > > E > > > > A > > > > > > > > If I name this table as myfile.txt, I have the following scripts to > > calculate the ratio of each amino acid residue at position 1: > > > > # showing levels of the 3rd column, which means the types of residues > > > > >myfile[,3] > > > > > > > > # calculating the ratio of L > > > > >list=c(which(myfile[,3]=="L")) > > > > >time0total=sum(myfile[,2]) > > > > >AA_L=0 > > > > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)} > > > > >ratio_L=AA_L/time0total > > > > > > > > So how can I write a script to do the same thing for the other two levels > > (T and R) in column 3, and also do this for every column that contains > > amino acid residues? > > > > > > > > Many thanks for any help you could give me on this topic! :) > > > > > > > > Regards, > > > > Zhao > > -- > > Zhao JIN > > Ph.D. Candidate > > Ruth Ley Lab > > 467 Biotech > > Field of Microbiology, Cornell University > > Lab: 607.255.4954 > > Cell: 412.889.3675 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ____________________________________________________________ > FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on > your desktop! > Check it out at http://www.inbox.com/marineaquarium > > > -- Zhao JIN Ph.D. Candidate Ruth Ley Lab 467 Biotech Field of Microbiology, Cornell University Lab: 607.255.4954 Cell: 412.889.3675 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.