Hi all,

Starting with BioC 2.14, the BSgenome data packages have been
modified as follow:

 1. Packages like BSgenome.Hsapiens.UCSC.hg19 that used to contain
    masked sequences no more contain the masks, only the
    "naked" sequences.
    If you want the masked sequences, you need to use one of the
new BSgenome.*.masked packages (e.g. BSgenome.Hsapiens.UCSC.hg19.masked).
    These new packages are light weight packages that contain only
    the masks and re-use the sequences from the corresponding naked
    BSgenome package. Use them as a regular BSgenome package.
    In fact BSgenome.Hsapiens.UCSC.hg19.masked is equivalent to
    the BSgenome.Hsapiens.UCSC.hg19 in BioC <= 2.13.

    See available.genomes() for the list of BSgenome packages currently
    available. Note that not all naked BSgenome package have a
    corresponding BSgenome.*.masked package.

 2. The sequences are now stored in a way that allow fast random access.
    As a consequence, getSeq() is faster when extracting small portions
    of the genome. Currently it's also slower when loading a full
    chromosome but this might be addressed in the future.

 3. The upstream sequences are deprecated. A better way to get them is
    to use genes() and flank() on a Transcript object followed by
    getSeq() on the BSgenome object. The deprecation message shows how
    to do this.

 4. For consistency with other annotation packages, the main object in
    a BSgenome package now is named as the package itself e.g.
    BSgenome.Hsapiens.UCSC.hg19. The old object (e.g. Hsapiens) is still
    available but will be deprecated at some point.

The devel repositories also contain a new BSgenome package for the
latest release of the Human genome:

  BSgenome.Hsapiens.NCBI.GRCh38

Please let me know if you have questions or concerns about this.

Thanks,
H.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to