On 05/21/2013 06:37 PM, Michael Lawrence wrote:
It's clear we're not going to agree. I will volunteer to maintain these,
OK? I will maintain them anyway for my own use.

That's fine with me. I didn't mean we shouldn't have those wrappers (and
I think Martin or Val are going to come back here with a proposal).
I just wanted to make it clear that the seqlevels setter is not that
hideous thing

seqlevels(x, new2old = match("chr1", seqlevels(x)), force = TRUE) <- "chr1"

Sorry if I kept beating again and again the same dead horse...

Cheers,
H.


Michael


On Tue, May 21, 2013 at 2:34 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:



    On 05/21/2013 08:00 AM, Michael Lawrence wrote:




        On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpa...@fhcrc.org
        <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:

             Michael,


             On 05/20/2013 08:44 PM, Michael Lawrence wrote:




                 On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès
        <hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
                 <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
                 <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>>> wrote:

                      Hi,


                      On 05/20/2013 03:15 PM, Michael Lawrence wrote:

                          On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
                          <mtmor...@fhcrc.org
        <mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org
        <mailto:mtmor...@fhcrc.org>>
                 <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
        <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>>> wrote:

                              Hi Michael --


                              On 5/20/2013 5:56 AM, Michael Lawrence wrote:

                                  While it's cool that seqlevels<- has
        become so
                 flexible,
                                  I still claim
                                  that
                                  rename/keep/drop would be a lot easier
        to read,
                 because
                                  they describe the
                                  high-level operation, and there's no
        reason to
                 deprecate
                                  them. They're
                                  also
                                  easier to remember. For example, for
        dropping with
                                  seqlevels<-, the user
                                  needs
                                  to remember that "force" is necessary
        to drop. Much
                                  easier to just say
                                  "dropSeqlevels(), please". And
        reimplementing
                                  keepSeqlevels is still too
                                  complicated. Is it such a maintenance
        burden to
                 have
                                  these simple
                                  wrappers sit
                                  on top of the low-level manipulators?

                                  Another suggestion: renameSeqlevels
        should not
                 require a
                                  named vector (in
                                  cases
                                  where it is unnamed, require that it is
                 parallel to the
                                  existing
                                  seqlevels, as
                                  with seqlevels<-).


                              I didn't really indicate what drove my
        desire to see
                              keepSeqlevels
                              deprecated. keepSeqlevels, seqlevels<-, and
                 isActiveSeq were
                              developed more
                              or less independently, and have different
                 contracts. The
                              different
                              contracts are confusing to the user, as is
        creating
                 a usable
                              help system
                              when there are multiple end points for
        what is a common
                              operation. The help
                              pages of each were inadequate in different
        ways.
                 And because
                              the code was
                              developed independently, support for different
                 objects was
                              inconsistent. So
                              actually, yes, the maintenance (and use)
        burden for the
                              previous state of
                              affairs was substantial.

                              On the other hand, I agree that
        keepSeqlevels is
                 convenient
                              as a simple
                              wrapper around seqlevels<-, in the same
        way that
                 setNames
                              and names<- are
                              both useful.

                              So we could iterate to

                                  keepSeqlevels <-
                                      function(x, value, ...)
                                  {
                                      seqlevels(x, force=TRUE) <- value
                                      x
                                  }

                              but I'd be less enthusiastic about
        maintaining the
                 original
                              contract of
                              keepSeqlevels, where keepSeqlevels(gr1,
        gr2), would
                 have
                              worked for two
                              GRanges objects.


                      Why would this be called keepSeqlevels() if,
        depending on
                 how it's used,
                      it will either add, drop, rename, or permute the
        seqlevels?
                      Couldn't this be called setSeqlevels?


                 I thought that new2old was necessary for permuting.


             The seqlevels setter has no 'new2old' arg.


                 You're right that
                 adding should be disallowed for keepSeqlevels(). Adding
                 seqlevels is not
                 a common operation. The two common operations are:
                 * Permuting, usually because the data were imported in
        non-canonical
                 order (seqnameStyle was designed to address this, no?).


             Permuting is straightforward with seqlevels<-:

                > seqlevels(gr)
                [1] "chr1" "chr2" "chr3"

                > seqlevels(gr) <- rev(seqlevels(gr))

                > seqlevels(gr)
                [1] "chr3" "chr2" "chr1"


                 * Subsetting, either by keeping or dropping.


             Also straightforward:

                > seqlevels(gr, force=TRUE) <- "chr2"

                > seqlevels(gr)
                [1] "chr2"


                 Two main reasons: taking a
                 small slice of the data for prototyping, or removing
        problematic
                 chromosomes (sex, circular). This goes back to bsapply
        and the
                 exclude
                 argument.


             There are other important use cases:

                - Dropping seqlevels that are *not* in use. This happens
        for example
                  when subsetting a BAM file with Rsamtools::filterBam
        to keep only
                  the alignments located on a given chromosome. The
        entire sequence
                  dictionary (in the header) is propagated to the
        resulting BAM file
                  so when you read the file with readGAlignments() you
        end up with a
                  bunch of useless seqlevels that you generally start by
        dropping.
                  You don't want to drop any alignment so force=FALSE is
        your friend.


        This is sort of tangential to the discussion, but do you really
        want to
        do this? I would preserve the universe as given by the BAM.

                - An even more common operation is renaming: 90% of the
        times that's
                  what I use seqlevels<- for, and that's what I tell
        people to use
                  when they need to rename their chromosomes. Also
        straightforward:

                    > seqlevels(gr)
                    [1] "chr1" "chr2" "chr3" "chrM"

                    > seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
                    > seqlevels(gr)
                    [1] "1" "2" "3" "M"

                    > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT"
                    > seqlevels(gr)
                    [1] "1"  "2"  "3"  "MT"

                  Note that this is simpler to use than
        renameSeqlevels() which
                  always required you to build the entire right value as
        a named
                  vector.


        Sure, renaming is a common use case. Not sure how I forgot that.

        As I wrote earlier in the thread, renameSeqlevels should be
        changed so
        as not to require naming of the vector.

             So the added-value of keepSeqlevels() seems to boil down to:

                (a) it always uses force=TRUE

                (b) it preserves the original object and returns the
        modified object

             If you want to restrict the use of keepSeqlevels() to
        permuting and
             dropping, you'll also have to disallow renaming (in
        addition to adding).
             Then its name will more or less reflect what it does (if
        "keep" means
             "permute" or "drop"). The final result will be something
        that does less
             than setSeqlevels() and that is not necessarily easier to
        read: both
             will set on the object whatever vector of seqlevels they
        are supplied,
             except that one will fail when doing so would actually
        result in adding
             or renaming seqlevels.


        So it looks like seqlevels<- is pretty powerful. Now I see that if I
        scroll down to the examples in the man page, I find the actual
        documentation. It's cool to learn that we can replace seqlevels
        on TxDb
        objects.


    Has been working on TranscriptDb objects for more than


        My argument has always been that since these low-level
        operations are so powerful, it's nice to have high-level
        operations to
        clarify the code. We have to think hard to know what the RHS
        will do in
        that replacement.


    When you do something like seqlevels(gr) <- new_seqlevels, you're saying
    "can you please replace the current seqlevels by those new ones".
    That's all. The request/contract is simple and all the complexity is
    taken care of internally by seqlevels<-. It might be that, depending on
    what's currently on 'gr', this will actually result in dropping, adding,
    permuting. Or *a combination of the 3*. Like here:


       > seqlevels(gr)
       [1] "chr1" "chr2" "chr3"

       > seqlevels(gr) <- c("chr4", "chr3", "chr1")
       > seqlevels(gr)
       [1] "chr4" "chr3" "chr1"

       # one seqlevel was drop, one was added, and the 2 seqlevels that were
       # preserved were reordered.

    When you want to subset a vector to keep all elements with an even
    index you do x[c(FALSE, TRUE)]. Is that keeping or dropping elements?
    Do we need to have a subsetting operator for keeping elements, another
    one for dropping elements, and another one for permuting them (like
    in x[sample(length(x))])?

    H.

        It's probably one of the most complex replacement
        operations; far more complex than the typical one, including
        levels<- on
        factor. There's nothing wrong with that; it's just complexity
        that we
        should be able to hide.

        Michael

             H.


                 Michael


                      H.



                          Well, I wasn't even aware of that feature, so
        you've
                 made your
                          point about
                          the documentation ;) Sounds like a good
        compromise to me.

                          Thanks for understanding,
                          Michael



                              Martin


                                  Michael






                                  On Sat, May 18, 2013 at 6:00 PM,
        Martin Morgan
                                  <mtmor...@fhcrc.org
        <mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org
        <mailto:mtmor...@fhcrc.org>>
                 <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
        <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>>
                                  <mailto:mtmor...@fhcrc.org
        <mailto:mtmor...@fhcrc.org>
                 <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>
        <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
                 <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>>>>

                                  wrote:

                                        On 05/18/2013 05:42 PM, Martin
        Morgan wrote:

                                            Some of the most common
        operations are
                                  straight-forward, e.g.,

                                                  > gr =
        GRanges(paste0("chr",
                 c(1:22,
                                  "X", "Y")), IRanges(1,
                                  100))
                                                  > seqlevels(gr) =
        sub("chr", "ch",
                                  seqlevels(gr))


                                        To get a more comprehensive
        example I
                 should have
                                  followed my own
                                  advice and
                                        grabbed from the help page!

                                              ## Rename:
                                              seqlevels(txdb) <-
        sub("chr", "",
                                  seqlevels(txdb))
                                              seqlevels(txdb)

                                              seqlevels(txdb) <-
        paste0("CH",
                 seqlevels(txdb))
                                              seqlevels(txdb)


                 seqlevels(txdb)[seqlevels(__**______txdb) ==


                                  "CHM"] <- "M"


                                              seqlevels(txdb)


                                        --
                                        Computational Biology / Fred
        Hutchinson
                 Cancer
                                  Research Center
                                        1100 Fairview Ave. N.
                                        PO Box 19024 Seattle, WA 98109

                                        Location: Arnold Building M1 B861
                                        Phone: (206) 667-2793
        <tel:%28206%29%20667-2793>
                 <tel:%28206%29%20667-2793> <tel:%28206%29%20667-2793>
                                  <tel:%28206%29%20667-2793>




                              --
                              Dr. Martin Morgan, PhD

                              Fred Hutchinson Cancer Research Center
                              1100 Fairview Ave. N.
                              PO Box 19024 Seattle, WA 98109


                                   [[alternative HTML version deleted]]


          _____________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.__org
        <mailto:Bioc-devel@r-project.org>>
                 <mailto:Bioc-devel@r-project.
        <mailto:Bioc-devel@r-project.>____org
                 <mailto:Bioc-devel@r-project.__org
        <mailto:Bioc-devel@r-project.org>>>
                          mailing list
        https://stat.ethz.ch/mailman/______listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
                 <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>



          <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
                 <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>


                      --
                      Hervé Pagès

                      Program in Computational Biology
                      Division of Public Health Sciences

                      Fred Hutchinson Cancer Research Center
                      1100 Fairview Ave. N, M1-B514
                      P.O. Box 19024
                      Seattle, WA 98109-1024

                      E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
                 <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>>

                      Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
        <tel:%28206%29%20667-5791>
                 <tel:%28206%29%20667-5791>
                      Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
        <tel:%28206%29%20667-1319>
                 <tel:%28206%29%20667-1319>



             --
             Hervé Pagès

             Program in Computational Biology
             Division of Public Health Sciences
             Fred Hutchinson Cancer Research Center
             1100 Fairview Ave. N, M1-B514
             P.O. Box 19024
             Seattle, WA 98109-1024

             E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
        <mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
             Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
        <tel:%28206%29%20667-5791>
             Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
        <tel:%28206%29%20667-1319>



    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to