Re: [R] How to do the same thing for all levels of a column?

Zhao Jin Tue, 24 Jul 2012 12:29:50 -0700

Hi John and Bert,

Thank you so much for your replies. Both of your scripts worked well, so
now I've learnt two ways to do it. :)


Bert: I was not very clear on what I wanted to do. I just would like to
calculate the residues shown in the table, not all residues. The
*apply*functions
* *are amazing!

John: as I am still digesting the codes, I am not sure if I fully
understood the argument .(variables, value) in the *ddply* line. The
description of *ddply* says that .variables show the variables to split
data frame by, as quoted variables, a formula or character vector. So does
.(variables, value) tell R to split the data frame by values, which are the
types of amino acid residues?

Thank you all again.

Cheers,
Zhao



2012/7/24 Bert Gunter <gunter.ber...@gene.com>

> ... and I neglected to mention that f = myfiles[,2]
>
> Sigh....  More coffee needed.
>
> -- Bert
>
> On Tue, Jul 24, 2012 at 9:43 AM, Bert Gunter <bgun...@gene.com> wrote:
> > Sorry. Typo in my previous. Should be:
> >
> >> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x,sum)))
> > $X1
> >          L          R          T
> > 0.91491320 0.03675651 0.04833030
> >
> > $X2
> >         E         M
> > 0.9827278 0.0172722
> >
> > $X3
> >         N         Y
> > 0.0483303 0.9516697
> >
> > $X4
> >         I         L         Q
> > 0.8976410 0.0850868 0.0172722
> >
> > $X5
> >         I         V
> > 0.9516697 0.0483303
> >
> > $X6
> >          P          S
> > 0.96324349 0.03675651
> >
> > $X7
> >         D         E         G
> > 0.8976410 0.0540287 0.0483303
> >
> > $X8
> >         A         C
> > 0.9827278 0.0172722
> >
> >
> >
> > On Tue, Jul 24, 2012 at 9:37 AM, Bert Gunter <bgun...@gene.com> wrote:
> >> OK, I admit it: I re-read what you wrote and now I'm confused. Is:
> >>
> >>> sapply(myfile[,-c(1,2)],function(x)prop.table(tapply(f,x)))
> >>
> >>             X1       X2        X3       X4     X5  X6    X7  X8
> >> [1,] 0.1428571 0.2 0.2857143 0.125 0.2 0.2 0.125 0.2
> >> [2,] 0.4285714 0.2 0.1428571 0.250 0.4 0.2 0.375 0.2
> >> [3,] 0.1428571 0.4 0.2857143 0.375 0.2 0.2 0.250 0.4
> >> [4,] 0.2857143 0.2 0.2857143 0.250 0.2 0.4 0.250 0.2
> >>
> >> what you want?
> >>
> >> -- Bert
> >> On Tue, Jul 24, 2012 at 9:17 AM, Bert Gunter <bgun...@gene.com> wrote:
> >>> The OP's request is a bit ambiguous to me: at a given residue, do you
> >>> wish to calculate the proportions for only those amino acids that
> >>> appear at that residue, or do you wish to include the proportions for
> >>> all amino acids, some of which might then be 0.
> >>>
> >>> Assuming the former, then I don't think one needs to go to the lengths
> >>> described by John below.
> >>>
> >>> Using your example (thanks!), the following seems to suffice:
> >>>
> >>>> sapply(myfile[,-c(1,2)],function(x)prop.table(table(x)))
> >>>
> >>> $X1
> >>> x
> >>>    L    R    T
> >>> 0.50 0.25 0.25
> >>>
> >>> $X2
> >>> x
> >>>    E    M
> >>> 0.75 0.25
> >>>
> >>> $X3
> >>> x
> >>>    N    Y
> >>> 0.25 0.75
> >>>
> >>> $X4
> >>> x
> >>>    I    L    Q
> >>> 0.25 0.50 0.25
> >>>
> >>> $X5
> >>> x
> >>>    I    V
> >>> 0.75 0.25
> >>>
> >>> $X6
> >>> x
> >>>    P    S
> >>> 0.75 0.25
> >>>
> >>> $X7
> >>> x
> >>>    D    E    G
> >>> 0.25 0.50 0.25
> >>>
> >>> $X8
> >>> x
> >>>    A    C
> >>> 0.75 0.25
> >>>
> >>>
> >>> This could, of course, then be modified to add zero proportions for
> >>> all non-appearing amino acids.
> >>>
> >>> -- Cheers,
> >>> Bert
> >>>
> >>> On Tue, Jul 24, 2012 at 8:18 AM, John Kane <jrkrid...@inbox.com>
> wrote:
> >>>>
> >>>>    I think this does what you want using two packages, plyr and
> reshape2 that
> >>>>    you may have to install.  If so install.packages("plyr",
> "reshape2") should
> >>>>    do the trick.
> >>>>    library(plyr)
> >>>>    library(reshape2)
> >>>>    # using supplied file 'myfile" from below
> >>>>    time0total = sum(myfile[,2])
> >>>>    mydata  <-  myfile[, 2:10]
> >>>>    md1  <-  melt(mydata, id = "Time_zero")
> >>>>    ddply(md1, .(variable, value), summarise, sum =
> sum(Time_zero)/time0total)
> >>>>
> >>>>
> >>>>    John Kane
> >>>>    Kingston ON Canada
> >>>>
> >>>>    -----Original Message-----
> >>>>    From: z...@cornell.edu
> >>>>    Sent: Tue, 24 Jul 2012 10:25:21 -0400
> >>>>    To: jrkrid...@inbox.com
> >>>>    Subject: Re: [R] How to do the same thing for all levels of a
> column?
> >>>>
> >>>>    Hi John,
> >>>>    Thank you for the tips. My apologies about the unreadable sample
> data...
> >>>>    So here is the output of the sample data, and hopefully it works
> this time
> >>>>    :)
> >>>>    myfile  <-  structure(list(Proteins = structure(1:4, .Label =
> c("p1", "p2",
> >>>>    "p3", "p4"), class = "factor"), Time_zero = c(0.0050723, 0.0002731,
> >>>>    9.76e-05, 0.0002077), X1 = structure(c(1L, 3L, 1L, 2L), .Label =
> c("L",
> >>>>    "R", "T"), class = "factor"), X2 = structure(c(1L, 1L, 2L, 1L
> >>>>    ), .Label = c("E", "M"), class = "factor"), X3 = structure(c(2L,
> >>>>    1L, 2L, 2L), .Label = c("N", "Y"), class = "factor"), X4 =
> structure(c(1L,
> >>>>    2L,  3L,  2L),  .Label  =  c("I",  "L",  "Q"), class = "factor"),
> X5 =
> >>>>    structure(c(1L,
> >>>>    2L, 1L, 1L), .Label = c("I", "V"), class = "factor"), X6 =
> structure(c(1L,
> >>>>    1L, 1L, 2L), .Label = c("P", "S"), class = "factor"), X7 =
> structure(c(1L,
> >>>>    3L,  2L,  2L),  .Label  =  c("D",  "E",  "G"), class = "factor"),
> X8 =
> >>>>    structure(c(1L,
> >>>>    1L,  2L,  1L),  .Label  =  c("A",  "C"),  class = "factor")),
> .Names =
> >>>>    c("Proteins",
> >>>>    "Time_zero", "X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"),
> row.names =
> >>>>    c(NA,
> >>>>    4L), class = "data.frame")
> >>>>    And here is my original question:
> >>>>    Basically, I have a bunch of protein sequences composed of
> different amino
> >>>>    acid residues, and each residue is represented by an uppercase
> letter. I
> >>>>    want  to  calculate the ratio of different amino acid residues at
> each
> >>>>    position of the proteins.
> >>>>
> >>>>    If  I  name  this table as myfile.txt, I have the following
> scripts to
> >>>>    calculate the ratio of each amino acid residue at position 1:
> >>>>
> >>>>    # showing levels of the 3rd column, which means the types of
> residues
> >>>>
> >>>>    >myfile[,3]
> >>>>
> >>>>
> >>>>    # calculating the ratio of L
> >>>>
> >>>>    >list=c(which(myfile[,3]=="L"))
> >>>>
> >>>>    >time0total=sum(myfile[,2])
> >>>>
> >>>>    >AA_L=0
> >>>>
> >>>>    >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
> >>>>
> >>>>    >ratio_L=AA_L/time0total
> >>>>
> >>>>
> >>>>    So how can I write a script to do the same thing for the other two
> levels (T
> >>>>    and R) in column 3, and also do this for every column that
> contains amino
> >>>>    acid residues?
> >>>>
> >>>>    Thanks a lot!
> >>>>
> >>>>    Regards,
> >>>>
> >>>>    Zhao
> >>>>    2012/7/24 John Kane <[1]jrkrid...@inbox.com>
> >>>>
> >>>>      First thing is to supply the data in a useable format.  As is it
> is
> >>>>      essenatially unreadable.  All R-beginners do this. :)
> >>>>      Have a look at the dput function  (?dput) for a good way to
> supply sample
> >>>>      data in an email.
> >>>>      If you have a large dataset probably a few dozen lines of data
> would be
> >>>>      fine.
> >>>>      Something like dput(head(mydata)) should be fine.  Just copy and
> paste the
> >>>>      output into your email.
> >>>>      Welcome to R.  I think you will like it.
> >>>>      John Kane
> >>>>      Kingston ON Canada
> >>>>
> >>>>    > -----Original Message-----
> >>>>    > From: [2]z...@cornell.edu
> >>>>    > Sent: Mon, 23 Jul 2012 18:01:11 -0400
> >>>>    > To: [3]r-help@r-project.org
> >>>>    > Subject: [R] How to do the same thing for all levels of a column?
> >>>>    >
> >>>>    > Dear all,
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > I am a R beginner, and I am looking for a way to do the same
> thing for
> >>>>    > all
> >>>>    > levels of a column in a table.
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > Basically, I have a bunch of protein sequences composed of
> different
> >>>>    > amino
> >>>>    > acid residues, and each residue is represented by an uppercase
> letter. I
> >>>>    > want to calculate the ratio of different amino acid residues at
> each
> >>>>    > position of the proteins. Here is an example table:
> >>>>    >
> >>>>    > Proteins
> >>>>    >
> >>>>    > Time_zero
> >>>>    >
> >>>>    > 1
> >>>>    >
> >>>>    > 2
> >>>>    >
> >>>>    > 3
> >>>>    >
> >>>>    > 4
> >>>>    >
> >>>>    > 5
> >>>>    >
> >>>>    > 6
> >>>>    >
> >>>>    > 7
> >>>>    >
> >>>>    > 8
> >>>>    >
> >>>>    > p1
> >>>>    >
> >>>>    > 0.0050723
> >>>>    >
> >>>>    > L
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > Y
> >>>>    >
> >>>>    > I
> >>>>    >
> >>>>    > I
> >>>>    >
> >>>>    > P
> >>>>    >
> >>>>    > D
> >>>>    >
> >>>>    > A
> >>>>    >
> >>>>    > p2
> >>>>    >
> >>>>    > 0.0002731
> >>>>    >
> >>>>    > T
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > N
> >>>>    >
> >>>>    > L
> >>>>    >
> >>>>    > V
> >>>>    >
> >>>>    > P
> >>>>    >
> >>>>    > G
> >>>>    >
> >>>>    > A
> >>>>    >
> >>>>    > p3
> >>>>    >
> >>>>    > 9.757E-05
> >>>>    >
> >>>>    > L
> >>>>    >
> >>>>    > M
> >>>>    >
> >>>>    > Y
> >>>>    >
> >>>>    > Q
> >>>>    >
> >>>>    > I
> >>>>    >
> >>>>    > P
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > C
> >>>>    >
> >>>>    > p4
> >>>>    >
> >>>>    > 0.0002077
> >>>>    >
> >>>>    > R
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > Y
> >>>>    >
> >>>>    > L
> >>>>    >
> >>>>    > I
> >>>>    >
> >>>>    > S
> >>>>    >
> >>>>    > E
> >>>>    >
> >>>>    > A
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > If I name this table as myfile.txt, I have the following scripts
> to
> >>>>    > calculate the ratio of each amino acid residue at position 1:
> >>>>    >
> >>>>    > # showing levels of the 3rd column, which means the types of
> residues
> >>>>    >
> >>>>    > >myfile[,3]
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > # calculating the ratio of L
> >>>>    >
> >>>>    > >list=c(which(myfile[,3]=="L"))
> >>>>    >
> >>>>    > >time0total=sum(myfile[,2])
> >>>>    >
> >>>>    > >AA_L=0
> >>>>    >
> >>>>    > >for (i in 1:length(list)){AA_L=sum(myfile[list[[i]],2]+AA_L)}
> >>>>    >
> >>>>    > >ratio_L=AA_L/time0total
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > So how can I write a script to do the same thing for the other
> two levels
> >>>>    > (T and R) in column 3, and also do this for every column that
> contains
> >>>>    > amino acid residues?
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > Many thanks for any help you could give me on this topic! :)
> >>>>    >
> >>>>    >
> >>>>    >
> >>>>    > Regards,
> >>>>    >
> >>>>    > Zhao
> >>>>    > --
> >>>>    > Zhao JIN
> >>>>    > Ph.D. Candidate
> >>>>    > Ruth Ley Lab
> >>>>    > 467 Biotech
> >>>>    > Field of Microbiology, Cornell University
> >>>>    > Lab: 607.255.4954
> >>>>    > Cell: 412.889.3675
> >>>>    >
> >>>>
> >>>>      >       [[alternative HTML version deleted]]
> >>>>      >
> >>>>      > ______________________________________________
> >>>>      > [4]R-help@r-project.org mailing list
> >>>>      > [5]https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>      > PLEASE do read the posting guide
> >>>>      > [6]http://www.R-project.org/posting-guide.html
> >>>>      > and provide commented, minimal, self-contained, reproducible
> code.
> >>>>      ____________________________________________________________
> >>>>      FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks &
> orcas on
> >>>>      your desktop!
> >>>>      Check it out at [7]http://www.inbox.com/marineaquarium
> >>>>
> >>>>    --
> >>>>    Zhao JIN
> >>>>    Ph.D. Candidate
> >>>>    Ruth Ley Lab
> >>>>    467 Biotech
> >>>>    Field of Microbiology, Cornell University
> >>>>    Lab: 607.255.4954
> >>>>    Cell: 412.889.3675
> >>>>      _________________________________________________________________
> >>>>
> >>>>    [8]3D Earth Screensaver Preview
> >>>>    Free 3D Earth Screensaver
> >>>>    Watch   the   Earth   right   on   your   desktop!  Check  it  out
>  at
> >>>>    [9]www.inbox.com/earth
> >>>>
> >>>> References
> >>>>
> >>>>    1. mailto:jrkrid...@inbox.com
> >>>>    2. mailto:z...@cornell.edu
> >>>>    3. mailto:r-help@r-project.org
> >>>>    4. mailto:R-help@r-project.org
> >>>>    5. https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>    6. http://www.R-project.org/posting-guide.html
> >>>>    7. http://www.inbox.com/marineaquarium
> >>>>    8. http://www.inbox.com/earth
> >>>>    9. http://www.inbox.com/earth
> >>>> ______________________________________________
> >>>> R-help@r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Bert Gunter
> >>> Genentech Nonclinical Biostatistics
> >>>
> >>> Internal Contact Info:
> >>> Phone: 467-7374
> >>> Website:
> >>>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> >>
> >>
> >>
> >> --
> >>
> >> Bert Gunter
> >> Genentech Nonclinical Biostatistics
> >>
> >> Internal Contact Info:
> >> Phone: 467-7374
> >> Website:
> >>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
> >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > Internal Contact Info:
> > Phone: 467-7374
> > Website:
> >
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>



-- 
Zhao JIN
Ph.D. Candidate
Ruth Ley Lab
467 Biotech
Field of Microbiology, Cornell University
Lab: 607.255.4954
Cell: 412.889.3675

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to do the same thing for all levels of a column?

Reply via email to