Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

Hervé Pagès Mon, 08 Sep 2014 13:01:33 -0700

On 09/08/2014 11:41 AM, Michael Lawrence wrote:

I might have requested the genome annotation, but I'm pretty sure it
wasn't me who decide on tracking it on a per-sequence basis.


OK, maybe. Don't trust my memory too much on this. No regrets though.
I think it was the right thing to do ;-) Just because the SAM/BAM
format does it is a good enough reason for us to do it too. According
to the SAM Spec, the header can look like:

  @HD   VN:1.3  SO:coordinate
  @SQ   SN:chr1_hg19    LN:45   AS:hg19
  @SQ   SN:chr1_mm10    LN:42   AS:mm10

The only problem is that seqinfo(BamFile("test.bam")) seems to ignore
the AS tag (genome assembly identifier) at the moment:

  > seqinfo(BamFile("test.bam"))
  Seqinfo of length 2
  seqnames seqlengths isCircular genome
  chr1_hg19        45         NA   <NA>
  chr1_mm10        42         NA   <NA>

Hopefully that can be addressed. But that's a separate issue...

Cheers,
H.

I could
imagine use cases for that though, e.g., when diagnosing sequencing
contamination (like human vs. mouse). But most other tools and file
formats expect a single genome per "track", so, for example, rtracklayer
has an internal function singleGenome() to take care of this.

On Mon, Sep 8, 2014 at 10:50 AM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:

    Hi Vince,

    Yes it would make sense to have the "show" method report the genome
    when genome(x) contains a unique non-NA value. I think the main
    use case for having the genome defined at the sequence level instead
    of the whole object level is metagenomics. Maybe Michael has some other
    good use cases to share since IIRC he requested the addition of the
    genome field a couple of years ago and made the case for having it
    defined at the sequence level.

    Cheers,
    H.


    On 09/08/2014 07:21 AM, Vincent Carey wrote:

        For GRanges x, my naive expectation is that genome(x) returns a
        length-

        one tag identifying the genome to which chromosomal coordinates

        correspond.  The genome() method seems to have sequence-specific

        semantics, which makes sense, but when we identify sequence

        with chromosome, it seems too complicated.  Is there a use case for

        a GRanges with sequences from several different genomes?


        One reason I am inquiring is that I feel it would be nice to
        have the
        GRanges show() method report, prominently, the genome in use (or NA

        if unspecified).  This could be accomplished by reporting
        unique(genome(x)), and perhaps that would be satisfactory.

        after example(genome) :

            seqinfo(txdb)


        Seqinfo of length 15

        seqnames seqlengths isCircular genome

        CH2L       23011544      FALSE    dm3

        CH2R       21146708      FALSE    dm3

        CH3L       24543557      FALSE    dm3

        CH3R       27905053      FALSE    dm3

        CH4         1351857      FALSE    dm3

        ...             ...        ...    ...

        CH3LHet     2555491      FALSE    dm3

        CH3RHet     2517507      FALSE    dm3

        CHXHet       204112      FALSE    dm3

        CHYHet       347038      FALSE    dm3

        CHUextra   29004656      FALSE    dm3

            genome(seqinfo(txdb))


              CH2L     CH2R     CH3L     CH3R      CH4      CHX
        CHU        M

             "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"
        "dm3"    "dm3"

           CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet CHUextra

             "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"

                 [[alternative HTML version deleted]]

        _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>


    --
    Hervé Pagès

    Program in Computational Biology
    Division of Public Health Sciences
    Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N, M1-B514
    P.O. Box 19024
    Seattle, WA 98109-1024

    E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
    Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
    Fax: (206) 667-1319 <tel:%28206%29%20667-1319>


    _________________________________________________
    Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
    https://stat.ethz.ch/mailman/__listinfo/bioc-devel
    <https://stat.ethz.ch/mailman/listinfo/bioc-devel>


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

Reply via email to