Re: [R] How to do the same thing for all levels of a column?

Zhao Jin Tue, 24 Jul 2012 10:47:04 -0700

Hi John,

Thank you for the tips. My apologies about the unreadable sample data...


So here is the output of the sample data, and hopefully it works this time
:)

structure(list(Proteins = structure(1:4, .Label = c("p1", "p2",
"p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label = c("L",
"R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 = structure(c(1L,
2L, 3L, 2L), .Label = c("I", "L", "Q"), class = "factor"), X5 =
structure(c(1L,
2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 = structure(c(1L,
1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 = structure(c(1L,
3L, 2L, 2L), .Label = c("D", "E", "G"), class = "factor"), X8 =
structure(c(1L,
1L, 2L, 1L), .Label = c("A", "C"), class = "factor")), .Names =
c("Proteins",
"Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names =
c(NA,
4L), class = "data.frame")

And here is my original question:
Basically, I have a bunch of protein sequences composed of different amino
acid residues, and each residue is represented by an uppercase letter. I
want to calculate the ratio of different amino acid residues at each
position of the proteins.

If I name this table as myfile.txt, I have the following scripts to
calculate the ratio of each amino acid residue at position 1:

# showing levels of the 3rd column, which means the types of residues

>myfile[,3]



# calculating the ratio of L

>list=c(which(myfile[,3]=="L"))

>time0total=sum(myfile[,2])

>AA_L=0

>for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}

>ratio_L=AA_L/time0total



So how can I write a script to do the same thing for the other two levels
(T and R) in column 3, and also do this for every column that contains
amino acid residues?


Thanks a lot!


Regards,

Zhao

2012/7/24 John Kane <jrkrid...@inbox.com>

> First thing is to supply the data in a useable format.  As is it is
> essenatially unreadable.  All R-beginners do this. :)
>
> Have a look at the dput function  (?dput) for a good way to supply sample
> data in an email.
>
> If you have a large dataset probably a few dozen lines of data would be
> fine.
>
> Something like dput(head(mydata)) should be fine.  Just copy and paste the
> output into your email.
>
> Welcome to R.  I think you will like it.
>
> John Kane
> Kingston ON Canada
>
>
> > -----Original Message-----
> > From: z...@cornell.edu
> > Sent: Mon, 23 Jul 2012 18:01:11 -0400
> > To: r-help@r-project.org
> > Subject: [R] How to do the same thing for all levels of a column?
> >
> > Dear all,
> >
> >
> >
> > I am a R beginner, and I am looking for a way to do the same thing for
> > all
> > levels of a column in a table.
> >
> >
> >
> > Basically, I have a bunch of protein sequences composed of different
> > amino
> > acid residues, and each residue is represented by an uppercase letter. I
> > want to calculate the ratio of different amino acid residues at each
> > position of the proteins. Here is an example table:
> >
> > Proteins
> >
> > Time_zero
> >
> > 1
> >
> > 2
> >
> > 3
> >
> > 4
> >
> > 5
> >
> > 6
> >
> > 7
> >
> > 8
> >
> > p1
> >
> > 0.0050723
> >
> > L
> >
> > E
> >
> > Y
> >
> > I
> >
> > I
> >
> > P
> >
> > D
> >
> > A
> >
> > p2
> >
> > 0.0002731
> >
> > T
> >
> > E
> >
> > N
> >
> > L
> >
> > V
> >
> > P
> >
> > G
> >
> > A
> >
> > p3
> >
> > 9.757E-05
> >
> > L
> >
> > M
> >
> > Y
> >
> > Q
> >
> > I
> >
> > P
> >
> > E
> >
> > C
> >
> > p4
> >
> > 0.0002077
> >
> > R
> >
> > E
> >
> > Y
> >
> > L
> >
> > I
> >
> > S
> >
> > E
> >
> > A
> >
> >
> >
> > If I name this table as myfile.txt, I have the following scripts to
> > calculate the ratio of each amino acid residue at position 1:
> >
> > # showing levels of the 3rd column, which means the types of residues
> >
> > >myfile[,3]
> >
> >
> >
> > # calculating the ratio of L
> >
> > >list=c(which(myfile[,3]=="L"))
> >
> > >time0total=sum(myfile[,2])
> >
> > >AA_L=0
> >
> > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
> >
> > >ratio_L=AA_L/time0total
> >
> >
> >
> > So how can I write a script to do the same thing for the other two levels
> > (T and R) in column 3, and also do this for every column that contains
> > amino acid residues?
> >
> >
> >
> > Many thanks for any help you could give me on this topic! :)
> >
> >
> >
> > Regards,
> >
> > Zhao
> > --
> > Zhao JIN
> > Ph.D. Candidate
> > Ruth Ley Lab
> > 467 Biotech
> > Field of Microbiology, Cornell University
> > Lab: 607.255.4954
> > Cell: 412.889.3675
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ____________________________________________________________
> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on
> your desktop!
> Check it out at http://www.inbox.com/marineaquarium
>
>
>


-- 
Zhao JIN
Ph.D. Candidate
Ruth Ley Lab
467 Biotech
Field of Microbiology, Cornell University
Lab: 607.255.4954
Cell: 412.889.3675

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to do the same thing for all levels of a column?

Reply via email to