Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

Hervé Pagès Tue, 09 Sep 2014 11:50:02 -0700

OK so let's go for a 1 line summarization of the seqinfo.
Vince is that OK if we keep this at the bottom of the object?
That way it will always be visible, even when the object requires
more than 1 screen to display (e.g. when it has a lot of metadata
cols). Will look something like:


  > gr
  GRanges with 3 ranges and 0 metadata columns:
        seqnames               ranges strand
           <Rle>            <IRanges>  <Rle>
    [1]    chr14 [19069583, 19069654]      +
    [2]    chr14 [19363738, 19363809]      +
    [3]    chr14 [19363755, 19363826]      -
    [4]    chr14 [19369799, 19369870]      +

seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); noseqlengths


Thanks,
H.

On 09/09/2014 06:38 AM, Michael Lawrence wrote:

Agreed, that looks a lot nicer.

On Tue, Sep 9, 2014 at 4:42 AM, Martin Morgan <mtmor...@fhcrc.org> wrote:

On 09/09/2014 04:02 AM, Michael Lawrence wrote:

I'm in favor of this display. The seqinfo output at the bottom has always
been annoying (over-emphasized).


the fact that the lengths are 'NA' can be a helpful prompt to do something
about it, e.g., add seqinfo when inputting the data. Also they are helpful
when one is told that seqlengths are incompatible during, e.g.,
findOverlaps. But I like the idea of less but more informative display of
seqinfo, along the lines suggested by Vince.

seqinfo: 60 seqlevels (2 circular) on 2 genomes (hg19, mm10); 60 'NA'
seqlengths

Martin

On Mon, Sep 8, 2014 at 10:08 PM, Vincent Carey <
st...@channing.harvard.edu>
wrote:


On Tue, Sep 9, 2014 at 12:30 AM, Hervé Pagès <hpa...@fhcrc.org> wrote:

  On 09/08/2014 06:42 PM, Michael Lawrence wrote:


  Instead of printing out multiple lines of a table that is rarely of

interest, could we develop Peter's idea toward something like:

hg19:chr1  hg19:chr2 ...
[lengths ...]

Not sure what condensed notation would be useful for circularity.

I don't know either. I'm worried that this would make the seqinfo
stuff look like a named vector and that the user would expect
hg19:chr1, hg19:chr2, etc... to be valid names.

With the table-like layout, some screen real estate can always be
saved by printing less lines:


  What I had in mind was



     > gr

    GRanges with 3 ranges and 0 metadata columns:

         genome: hg19


           seqnames               ranges strand

             <Rle>            <IRanges>  <Rle>
      [1]    chr14 [19069583, 19069654]      +
      [2]    chr14 [19363738, 19363809]      +
      [3]    chr14 [19363755, 19363826]      -
      [4]    chr14 [19369799, 19369870]      +


you could then probably dispense with the seqlengths.  i have
never found them too useful except as a key to the  genome.

if there are multiple genomes, we have something like

genomes: hg19, mm9

the point is to make it prominent, particularly at a time of transition.



  --- seqinfo: 60 seqlevels (2 circulars) on 2 genomes (hg19, mm10) ---

      seqlevels                                seqlengths isCircular
genome
      chr1                                      249250621       <NA>
  hg19
      chr10                                     135534747       <NA>
  hg19
      ...                                             ...        ...
...
      chrX                                      155270560       <NA>
  hg19
      chrY                                       59373566       <NA>
  hg19

I agree that the exact content of the seqinfo table itself is rarely
of interest so printing only 3 or 4 lines is OK. IMO it's important
to make the user aware of the existence of this hidden table and to
display it like what it really is (i.e. a table). Also displaying the
column names is a well established tradition and serves the purpose
of providing a quick summary of the accessors that are available to
access those fields.

H.

On Mon, Sep 8, 2014 at 5:21 PM, Peter Hickey <hic...@wehi.edu.au
<mailto:hic...@wehi.edu.au>> wrote:

      Perhaps it might be useful to have some way of highlighting if any
      of the chromosomes are circular or highlighting if there are
      multiple genomes present? Otherwise this information might be
hidden
      in the "…"

      Cheers,
      Pete


      On 09/09/2014, at 9:44 AM, Hervé Pagès <hpa...@fhcrc.org
      <mailto:hpa...@fhcrc.org>> wrote:

       > On 09/08/2014 02:28 PM, Peter Hickey wrote:
       >> Just a vote for still allowing for multiple genomes in a
Seqinfo
      object (in a GRanges object). My use case is in
bisulfite-sequencing
      experiments where there is often a spike-in of a lambda phage
genome
      along with the genome of interest (human or mouse). It's often
      useful to keep all data from a single library together in the same
      objet but process according to genome(x) for each seqlevel.
       >
       > Note taken. Thanks Pete! It's always great to know about
concrete
use
       > cases.
       >
       >>
       >> FWIW, I like Vincent's proposal of
selectSome(unique(genome(x)))
      in the show method.
       >
       > Or what about displaying the genome next to the seqlevel it's
       > associated with? Like e.g.:
       >
       >  > gr
       >  GRanges with 3 ranges and 0 metadata columns:
       >        seqnames               ranges strand
       >           <Rle>            <IRanges>  <Rle>
       >    [1]    chr14 [19069583, 19069654]      +
       >    [2]    chr14 [19363738, 19363809]      +
       >    [3]    chr14 [19363755, 19363826]      -
       >    [4]    chr14 [19369799, 19369870]      +
       >    ---
       >    seqinfo:
       >      seqlevels             seqlengths isCircular genome
       >      chr1                   249250621       <NA>   hg19
       >      chr10                  135534747       <NA>   hg19
       >      chr11                  135006516       <NA>   hg19
       >      ...                          ...        ...    ...
       >      chrUn_gl000249             38502       <NA>   hg19
       >      chrX                   155270560       <NA>   hg19
       >      chrY                    59373566       <NA>   hg19
       >
       > That way, we also raise awareness about the isCircular field.
       > The current choice to only display the seqlengths pre-dates the
       > existence of the seqinfo slot but might be a little bit
misleading
       > those days since it only exposes some arbitrary seqinfo fields.
       >
       > H.
       >
       >>
       >> Cheers,
       >> Pete
       >>
       >>
       >>> I might have requested the genome annotation, but I'm pretty
      sure it wasn't
       >>> me who decide on tracking it on a per-sequence basis. I could
      imagine use
       >>> cases for that though, e.g., when diagnosing sequencing
      contamination (like
       >>> human vs. mouse). But most other tools and file formats
expect
      a single
       >>> genome per "track", so, for example, rtracklayer has an
      internal function
       >>> singleGenome() to take care of this.
       >>>
       >>> On Mon, Sep 8, 2014 at 10:50 AM, Herv? Pag?s <
hpa...@fhcrc.org
      <mailto:hpa...@fhcrc.org>> wrote:
       >>>
       >>>> Hi Vince,
       >>>>
       >>>> Yes it would make sense to have the "show" method report the
      genome
       >>>> when genome(x) contains a unique non-NA value. I think the
main
       >>>> use case for having the genome defined at the sequence level
      instead
       >>>> of the whole object level is metagenomics. Maybe Michael has
      some other
       >>>> good use cases to share since IIRC he requested the addition
      of the
       >>>> genome field a couple of years ago and made the case for
having it
       >>>> defined at the sequence level.
       >>>>
       >>>> Cheers,
       >>>> H.
       >>>>
       >>>>
       >>>> On 09/08/2014 07:21 AM, Vincent Carey wrote:
       >>>>
       >>>>> For GRanges x, my naive expectation is that genome(x)
returns
      a length-
       >>>>>
       >>>>> one tag identifying the genome to which chromosomal
coordinates
       >>>>>
       >>>>> correspond.  The genome() method seems to have
sequence-specific
       >>>>>
       >>>>> semantics, which makes sense, but when we identify sequence
       >>>>>
       >>>>> with chromosome, it seems too complicated.  Is there a use
      case for
       >>>>>
       >>>>> a GRanges with sequences from several different genomes?
       >>>>>
       >>>>>
       >>>>> One reason I am inquiring is that I feel it would be nice
to
      have the
       >>>>> GRanges show() method report, prominently, the genome in
use
      (or NA
       >>>>>
       >>>>> if unspecified).  This could be accomplished by reporting
       >>>>> unique(genome(x)), and perhaps that would be satisfactory.
       >>>>>
       >>>>> after example(genome) :
       >>>>>
       >>>>> seqinfo(txdb)
       >>>>>>
       >>>>>
       >>>>> Seqinfo of length 15
       >>>>>
       >>>>> seqnames seqlengths isCircular genome
       >>>>>
       >>>>> CH2L       23011544      FALSE    dm3
       >>>>>
       >>>>> CH2R       21146708      FALSE    dm3
       >>>>>
       >>>>> CH3L       24543557      FALSE    dm3
       >>>>>
       >>>>> CH3R       27905053      FALSE    dm3
       >>>>>
       >>>>> CH4         1351857      FALSE    dm3
       >>>>>
       >>>>> ...             ...        ...    ...
       >>>>>
       >>>>> CH3LHet     2555491      FALSE    dm3
       >>>>>
       >>>>> CH3RHet     2517507      FALSE    dm3
       >>>>>
       >>>>> CHXHet       204112      FALSE    dm3
       >>>>>
       >>>>> CHYHet       347038      FALSE    dm3
       >>>>>
       >>>>> CHUextra   29004656      FALSE    dm3
       >>>>>
       >>>>> genome(seqinfo(txdb))
       >>>>>>
       >>>>>
       >>>>>     CH2L     CH2R     CH3L     CH3R      CH4      CHX
      CHU        M
       >>>>>
       >>>>>    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"
      "dm3"    "dm3"
       >>>>>
       >>>>>  CH2LHet  CH2RHet  CH3LHet  CH3RHet   CHXHet   CHYHet
CHUextra
       >>>>>
       >>>>>    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"    "dm3"
"dm3"
       >>>>>
       >>>>>        [[alternative HTML version deleted]]
       >>>>>
       >>>>> _______________________________________________
       >>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
      mailing list
       >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
       >>>>>
       >>>>>
       >>>> --
       >>>> Herv? Pag?s
       >>>>
       >>>> Program in Computational Biology
       >>>> Division of Public Health Sciences
       >>>> Fred Hutchinson Cancer Research Center
       >>>> 1100 Fairview Ave. N, M1-B514
       >>>> P.O. Box 19024
       >>>> Seattle, WA 98109-1024
       >>>>
       >>>> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
       >>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
       >>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
       >>>>
       >>>>
       >>>> _______________________________________________
       >>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
      mailing list
       >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
       >>>>
       >>
       >> --------------------------------
       >> Peter Hickey,
       >> PhD Student/Research Assistant,
       >> Bioinformatics Division,
       >> Walter and Eliza Hall Institute of Medical Research,
       >> 1G Royal Parade, Parkville, Vic 3052, Australia.
       >> Ph: +613 9345 2324 <tel:%2B613%209345%202324>
       >>
       >> hic...@wehi.edu.au <mailto:hic...@wehi.edu.au>
       >> http://www.wehi.edu.au
       >>
       >>
      ____________________________________________________________
__________
       >> The information in this email is confidential and
      intend...{{dropped:6}}
      >>
      >> _______________________________________________
      >>Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
mailing list
      >>https://stat.ethz.ch/mailman/listinfo/bioc-devel
      >>
      >
      > --
       > Hervé Pagès
      >
      > Program in Computational Biology
      > Division of Public Health Sciences
      > Fred Hutchinson Cancer Research Center
      > 1100 Fairview Ave. N, M1-B514
      > P.O. Box 19024
      > Seattle, WA 98109-1024
      >
      > E-mail:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
      > Phone:(206) 667-5791 <tel:%28206%29%20667-5791>
      > Fax:(206) 667-1319 <tel:%28206%29%20667-1319>

      --------------------------------
      Peter Hickey,
      PhD Student/Research Assistant,
      Bioinformatics Division,
      Walter and Eliza Hall Institute of Medical Research,
      1G Royal Parade, Parkville, Vic 3052, Australia.
      Ph: +613 9345 2324 <tel:%2B613%209345%202324>

      hic...@wehi.edu.au <mailto:hic...@wehi.edu.au>
      http://www.wehi.edu.au


      ____________________________________________________________
__________
      The information in this email is confidential and
intend...{{dropped:8}}


      _______________________________________________
      Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
mailing
list
      https://stat.ethz.ch/mailman/listinfo/bioc-devel



  --

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] should genome() be so complicated?/add genome report to GRanges show method

Reply via email to