Thanks for the explanation Hervé. Best wishes
Leonard On Tue, Aug 18, 2020 at 10:06 AM Hervé Pagès <hpa...@fredhutch.org> wrote: > On 8/18/20 01:40, Kasper Daniel Hansen wrote: > > In light of this, could we get a version of GRCh37 with only a single > > mitochondrial genome? > > You mean a BSgenome.Hsapiens.NCBI.GRCh37.p13 package? So it would > contain the same sequences as BSgenome.Hsapiens.UCSC.hg19 but without > the hg19:chrM sequence? > > Certainly doable but note that by using BSgenome.Hsapiens.UCSC.hg38 you > stay away from this mess. I'm not sure that adding yet another BSgenome > package would make the situation less confusing. > > > > > On Fri, Aug 14, 2020 at 6:17 PM Hervé Pagès <hpa...@fredhutch.org > > <mailto:hpa...@fredhutch.org>> wrote: > > > > Hi Felix, > > > > On 8/13/20 21:43, Felix Ernst wrote: > > > Hi Leonard, Hi Herve, > > > > > > I followed your conversation, since I have noticed the same > > problem. Thanks, Herve, for the explanation of the recent changes on > > hg19. > > > > > > The GRCh37.P13 report states in its last line: > > > > > > MT assembled-molecule MT Mitochondrion J01415.2 > > = NC_012920.1 non-nuclear 16569 chrM > > > > > > Since the last name is called "UCSC-style-name", wouldn't that > > mean that chrM has to be renamed to MT and not chrMT? > > > > This is a mistake in the sequence report for GRCh37.p13. > GRCh37.p13:MT > > is the same as hg19:chrMT, not hg19:chrM. > > > > hg19:chrM and hg19:chrMT are **not** the same sequences. The former > is > > NC_001807 and has length 16571 and the latter is NC_012920.1 and has > > length 16569. > > > > Yes, seqlevelsStyle() is sorting out all this mess for you ;-) > > > > Cheers, > > H. > > > > > > > > Thanks again for the explanation. > > > > > > Cheers, > > > Felix > > > > > > -----Ursprüngliche Nachricht----- > > > Von: Bioc-devel <bioc-devel-boun...@r-project.org > > <mailto:bioc-devel-boun...@r-project.org>> Im Auftrag von Hervé > Pagès > > > Gesendet: Freitag, 14. August 2020 01:08 > > > An: Leonard Goldstein <goldstein.leon...@gene.com > > <mailto:goldstein.leon...@gene.com>>; bioc-devel@r-project.org > > <mailto:bioc-devel@r-project.org> > > > Cc: charlotte.sone...@fmi.ch <mailto:charlotte.sone...@fmi.ch> > > > Betreff: Re: [Bioc-devel] BSgenome changes > > > > > > Hi Leonard, > > > > > > On 8/12/20 15:22, Leonard Goldstein via Bioc-devel wrote: > > >> Dear Bioc team, > > >> > > >> I'm following up on this recent GitHub issue > > >> > > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ldg21 > > >> > > > > _SGSeq_issues_5&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYvfbojaqTJZVg&s=Tfk-tDM99P63dnsvMydG2phv5WQPVbJzPk0hzi-_1SE&e= > > >. Please see the issue for more details and code examples. > > >> > > >> It looks like changes in Bioc devel result in two copies of the > > >> mitochondrial chromosome for BSgenome.Hsapiens.UCSC.hg19 -- one > > named > > >> chrM like in previous package versions (length 16571) and one > named > > >> chrMT (length 16569). > > >> > > >> When using seqlevelsStyle() to change chromosome names from UCSC > to > > >> NCBI format, this results in new behavior -- in the past chrM was > > >> simply renamed MT, now the different sequence chrMT is used. Is > > this intended? > > > > > > Absolutely intended. > > > > > > There is a long story behind the unfortunate fate of the > > mitochondrial chromosome in hg19. I'll try to keep it short. > > > > > > When the UCSC folks released the hg19 browser more than 10 years > > ago, they based it on assembly GRCh37: > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.13&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=jWtgKVQGC-SQp6i4prhKBiD5cBh2kEc8R1gL2uPlzy0&e= > > > > > > See sequence report for GRCh37: > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.13-5FGRCh37_GCF-5F000001405.13-5FGRCh37-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=2mzBk6ksCERabHcDIy7tR6p1aQvFGkLM8lZNrsWrA18&e= > > > > > > For some mysterious reason GRCh37 didn't include the > > mitochondrial chromosome so the UCSC folks decided to use > > mitochondrial sequence > > > NC_001807 and called it chrM. > > > > > > However, UCSC has recently decided to base hg19 on GRCh37.p13 > > instead of GRCh37. A rather surprising move after many years of hg19 > > being based on the latter. > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.ncbi.nlm.nih.gov_assembly_GCF-5F000001405.25_&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=gxOOdwtmHjZfz-EAFblY0cm-7upZ9useI3sEgDD87o8&e= > > > > > > See sequence report for GRCh37.p13: > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__ftp.ncbi.nlm.nih.gov_genomes_all_GCF_000_001_405_GCF-5F000001405.25-5FGRCh37.p13_GCF-5F000001405.25-5FGRCh37.p13-5Fassembly-5Freport.txt&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=epUg7bSfwCEF_WUOPlT5hPmLXHY7V51Mau09UaQNB5o&e= > > > > > > Note that GRCh37.p13 does include the mitochondrial chromosome. > > It's called MT in the official sequence report above and chrMT in > hg19. > > > > > > At the same time the UCSC folks decided to keep chrM so now hg19 > > contains 2 mitochondrial sequences: chrM and chrMT. Previously it > > has only one: chrM. > > > > > > So what you see in BioC devel in BSgenome.Hsapiens.UCSC.hg19 and > with > > > seqlevelsStyle(genome) is only reflecting this. In particular > > > seqlevelsStyle(genome) <- "NCBI" now does the following: > > > > > > - Rename chrMT -> MT. > > > > > > - chrM does NOT get renamed. There is no point in renaming > > this sequence because it has no equivalent in GRCh37.p13. > > > > > > Hope this helps, > > > > > > H. > > > > > >> > > >> Leonard > > >> > > >> [[alternative HTML version deleted]] > > >> > > >> _______________________________________________ > > >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > > mailing list > > >> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail > > >> > > > man_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeA > > >> > > > vimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=n5bIFHTIgC1B4EdjWUDLIlVcRJdXScYv > > >> fbojaqTJZVg&s=IczvesjTwEkPQVlFX5wKSJLUHyjNHE0sk71a-kMAVEI&e= > > >> > > > > > > -- > > > Hervé Pagès > > > > > > Program in Computational Biology > > > Division of Public Health Sciences > > > Fred Hutchinson Cancer Research Center > > > 1100 Fairview Ave. N, M1-B514 > > > P.O. Box 19024 > > > Seattle, WA 98109-1024 > > > > > > E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org> > > > Phone: (206) 667-5791 > > > Fax: (206) 667-1319 > > > > > > _______________________________________________ > > > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> > > mailing list > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIGaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=49jni5SmG_DH80nnPZXXqvFNceB5jkZtlb7eKEA8558&s=g4eW0swjrNpysDJ67do3xLWcLyskjH51X5-x4kMJYDw&e= > > > > > > > -- > > Hervé Pagès > > > > Program in Computational Biology > > Division of Public Health Sciences > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, M1-B514 > > P.O. Box 19024 > > Seattle, WA 98109-1024 > > > > E-mail: hpa...@fredhutch.org <mailto:hpa...@fredhutch.org> > > Phone: (206) 667-5791 > > Fax: (206) 667-1319 > > > > _______________________________________________ > > Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing > list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > < > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwMFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=5BrpbmuLSg2cS13gst2oJ-M8PG3kijaxWs3dZkYY8yw&s=NvAaJQhMJpXLBRTOJp4WG11FR4tuCXJ8cfgCdMlv5OY&e= > > > > > > > > > > -- > > Best, > > Kasper > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel