When I hit this I usually just do: seqlevels(gr) <- paste0("chr", c(1:22, "X", "Y"))
It would be nice if there were a function for sorting chromosome names according to a specified naming convention. Like sortSeqlevels(gr, UCSCNaming()) or the alternative replacement syntax: seqinfo(gr) <- sort(seqinfo(gr), UCSCNaming()). Although sort is only single-dispatch. Michael On Tue, Sep 17, 2013 at 8:59 AM, Vincent Carey <st...@channing.harvard.edu>wrote: > I am replying to this as Michael mentions that this is a powerful > operation. Here's my > problem that I believe is related, but I do not see a straightforward > solution > > > bhmm19uni > GRanges with 559456 ranges and 4 metadata columns: > seqnames ranges strand | name > score > <Rle> <IRanges> <Rle> | <character> > <numeric> > 1 chr1 [ 67229025, 67341224] * | 13_Heterochrom/lo > 0 > 2 chr1 [203057955, 203060154] * | 12_Repressed > 0 > 3 chr1 [ 8458427, 8486226] * | 13_Heterochrom/lo > 0 > 4 chr1 [ 16904427, 16904826] * | 5_Strong_Enhancer > 0 > 5 chr1 [ 25289427, 25293426] * | 11_Weak_Txn > 0 > ... ... ... ... ... ... > ... > 570766 chr22 [51100931, 51101330] * | 2_Weak_Promoter > 0 > 570767 chr22 [51101331, 51101530] * | 6_Weak_Enhancer > 0 > 570768 chr22 [51101531, 51101730] * | 2_Weak_Promoter > 0 > 570769 chr22 [51101731, 51178130] * | 13_Heterochrom/lo > 0 > 570770 chr22 [51178131, 51178330] * | 15_Repetitive/CNV > 0 > thick numcode > <IRanges> <character> > 1 [ 67001613, 67113812] 13 > 2 [201324578, 201326777] 12 > 3 [ 8381014, 8408813] 13 > 4 [ 16777014, 16777413] 5 > 5 [ 25162014, 25166013] 11 > ... ... ... > 570766 [49447797, 49448196] 2 > 570767 [49448197, 49448396] 6 > 570768 [49448397, 49448596] 2 > 570769 [49448597, 49524996] 13 > 570770 [49524997, 49525196] 15 > --- > seqlengths: > chr1 chr10 chr11 chr5 ... chr2 chr21 > chr4 > 249250621 135534747 135006516 180915260 ... 243199373 48129895 > 191154276 > > If I try to make a karyogram with ggbio, the chromosomes are in an > unnatural order. What is the right approach to getting the seqinfo in a > natural order with respect to chromosome names and lengths? Reassigning > seqlevels seems dangerous but I haven't experimented fully. It must be a > common use case but I do not see it in doc. > > > > On Tue, May 21, 2013 at 11:00 AM, Michael Lawrence < > lawrence.mich...@gene.com> wrote: > >> On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpa...@fhcrc.org> wrote: >> >> > Michael, >> > >> > >> > On 05/20/2013 08:44 PM, Michael Lawrence wrote: >> > >> >> >> >> >> >> >> >> On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès <hpa...@fhcrc.org >> >> <mailto:hpa...@fhcrc.org>> wrote: >> >> >> >> Hi, >> >> >> >> >> >> On 05/20/2013 03:15 PM, Michael Lawrence wrote: >> >> >> >> On Mon, May 20, 2013 at 3:11 PM, Martin Morgan >> >> <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> wrote: >> >> >> >> Hi Michael -- >> >> >> >> >> >> On 5/20/2013 5:56 AM, Michael Lawrence wrote: >> >> >> >> While it's cool that seqlevels<- has become so >> flexible, >> >> I still claim >> >> that >> >> rename/keep/drop would be a lot easier to read, because >> >> they describe the >> >> high-level operation, and there's no reason to >> deprecate >> >> them. They're >> >> also >> >> easier to remember. For example, for dropping with >> >> seqlevels<-, the user >> >> needs >> >> to remember that "force" is necessary to drop. Much >> >> easier to just say >> >> "dropSeqlevels(), please". And reimplementing >> >> keepSeqlevels is still too >> >> complicated. Is it such a maintenance burden to have >> >> these simple >> >> wrappers sit >> >> on top of the low-level manipulators? >> >> >> >> Another suggestion: renameSeqlevels should not require >> a >> >> named vector (in >> >> cases >> >> where it is unnamed, require that it is parallel to the >> >> existing >> >> seqlevels, as >> >> with seqlevels<-). >> >> >> >> >> >> I didn't really indicate what drove my desire to see >> >> keepSeqlevels >> >> deprecated. keepSeqlevels, seqlevels<-, and isActiveSeq >> were >> >> developed more >> >> or less independently, and have different contracts. The >> >> different >> >> contracts are confusing to the user, as is creating a >> usable >> >> help system >> >> when there are multiple end points for what is a common >> >> operation. The help >> >> pages of each were inadequate in different ways. And >> because >> >> the code was >> >> developed independently, support for different objects was >> >> inconsistent. So >> >> actually, yes, the maintenance (and use) burden for the >> >> previous state of >> >> affairs was substantial. >> >> >> >> On the other hand, I agree that keepSeqlevels is convenient >> >> as a simple >> >> wrapper around seqlevels<-, in the same way that setNames >> >> and names<- are >> >> both useful. >> >> >> >> So we could iterate to >> >> >> >> keepSeqlevels <- >> >> function(x, value, ...) >> >> { >> >> seqlevels(x, force=TRUE) <- value >> >> x >> >> } >> >> >> >> but I'd be less enthusiastic about maintaining the original >> >> contract of >> >> keepSeqlevels, where keepSeqlevels(gr1, gr2), would have >> >> worked for two >> >> GRanges objects. >> >> >> >> >> >> Why would this be called keepSeqlevels() if, depending on how it's >> >> used, >> >> it will either add, drop, rename, or permute the seqlevels? >> >> Couldn't this be called setSeqlevels? >> >> >> >> >> >> I thought that new2old was necessary for permuting. >> >> >> > >> > The seqlevels setter has no 'new2old' arg. >> > >> > >> > You're right that >> >> adding should be disallowed for keepSeqlevels(). Adding seqlevels is >> not >> >> a common operation. The two common operations are: >> >> * Permuting, usually because the data were imported in non-canonical >> >> order (seqnameStyle was designed to address this, no?). >> >> >> > >> > Permuting is straightforward with seqlevels<-: >> > >> > > seqlevels(gr) >> > [1] "chr1" "chr2" "chr3" >> > >> > > seqlevels(gr) <- rev(seqlevels(gr)) >> > >> > > seqlevels(gr) >> > [1] "chr3" "chr2" "chr1" >> > >> > >> > * Subsetting, either by keeping or dropping. >> >> >> > >> > Also straightforward: >> > >> > > seqlevels(gr, force=TRUE) <- "chr2" >> > >> > > seqlevels(gr) >> > [1] "chr2" >> > >> > >> > Two main reasons: taking a >> >> small slice of the data for prototyping, or removing problematic >> >> chromosomes (sex, circular). This goes back to bsapply and the exclude >> >> argument. >> >> >> > >> > There are other important use cases: >> > >> > - Dropping seqlevels that are *not* in use. This happens for example >> > when subsetting a BAM file with Rsamtools::filterBam to keep only >> > the alignments located on a given chromosome. The entire sequence >> > dictionary (in the header) is propagated to the resulting BAM file >> > so when you read the file with readGAlignments() you end up with a >> > bunch of useless seqlevels that you generally start by dropping. >> > You don't want to drop any alignment so force=FALSE is your friend. >> > >> > >> This is sort of tangential to the discussion, but do you really want to do >> this? I would preserve the universe as given by the BAM. >> >> >> > - An even more common operation is renaming: 90% of the times that's >> > what I use seqlevels<- for, and that's what I tell people to use >> > when they need to rename their chromosomes. Also straightforward: >> > >> > > seqlevels(gr) >> > [1] "chr1" "chr2" "chr3" "chrM" >> > >> > > seqlevels(gr) <- sub("^chr", "", seqlevels(gr)) >> > > seqlevels(gr) >> > [1] "1" "2" "3" "M" >> > >> > > seqlevels(gr)[seqlevels(gr) == "M"] <- "MT" >> > > seqlevels(gr) >> > [1] "1" "2" "3" "MT" >> > >> > Note that this is simpler to use than renameSeqlevels() which >> > always required you to build the entire right value as a named >> > vector. >> > >> > >> Sure, renaming is a common use case. Not sure how I forgot that. >> >> As I wrote earlier in the thread, renameSeqlevels should be changed so as >> not to require naming of the vector. >> >> >> > So the added-value of keepSeqlevels() seems to boil down to: >> > >> > (a) it always uses force=TRUE >> > >> > (b) it preserves the original object and returns the modified object >> > >> > If you want to restrict the use of keepSeqlevels() to permuting and >> > dropping, you'll also have to disallow renaming (in addition to adding). >> > Then its name will more or less reflect what it does (if "keep" means >> > "permute" or "drop"). The final result will be something that does less >> > than setSeqlevels() and that is not necessarily easier to read: both >> > will set on the object whatever vector of seqlevels they are supplied, >> > except that one will fail when doing so would actually result in adding >> > or renaming seqlevels. >> > >> > >> So it looks like seqlevels<- is pretty powerful. Now I see that if I >> scroll >> down to the examples in the man page, I find the actual documentation. >> It's >> cool to learn that we can replace seqlevels on TxDb objects. My argument >> has always been that since these low-level operations are so powerful, >> it's >> nice to have high-level operations to clarify the code. We have to think >> hard to know what the RHS will do in that replacement. It's probably one >> of >> the most complex replacement operations; far more complex than the typical >> one, including levels<- on factor. There's nothing wrong with that; it's >> just complexity that we should be able to hide. >> >> Michael >> >> H. >> > >> > >> >> Michael >> >> >> >> >> >> H. >> >> >> >> >> >> >> >> Well, I wasn't even aware of that feature, so you've made your >> >> point about >> >> the documentation ;) Sounds like a good compromise to me. >> >> >> >> Thanks for understanding, >> >> Michael >> >> >> >> >> >> >> >> Martin >> >> >> >> >> >> Michael >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, May 18, 2013 at 6:00 PM, Martin Morgan >> >> <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org> >> >> <mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org >> >>> >> >> >> >> wrote: >> >> >> >> On 05/18/2013 05:42 PM, Martin Morgan wrote: >> >> >> >> Some of the most common operations are >> >> straight-forward, e.g., >> >> >> >> > gr = GRanges(paste0("chr", c(1:22, >> >> "X", "Y")), IRanges(1, >> >> 100)) >> >> > seqlevels(gr) = sub("chr", "ch", >> >> seqlevels(gr)) >> >> >> >> >> >> To get a more comprehensive example I should have >> >> followed my own >> >> advice and >> >> grabbed from the help page! >> >> >> >> ## Rename: >> >> seqlevels(txdb) <- sub("chr", "", >> >> seqlevels(txdb)) >> >> seqlevels(txdb) >> >> >> >> seqlevels(txdb) <- paste0("CH", >> >> seqlevels(txdb)) >> >> seqlevels(txdb) >> >> >> >> seqlevels(txdb)[seqlevels(__****__txdb) == >> >> >> >> >> "CHM"] <- "M" >> >> >> >> >> >> seqlevels(txdb) >> >> >> >> >> >> -- >> >> Computational Biology / Fred Hutchinson Cancer >> >> Research Center >> >> 1100 Fairview Ave. N. >> >> PO Box 19024 Seattle, WA 98109 >> >> >> >> Location: Arnold Building M1 B861 >> >> Phone: (206) 667-2793 <tel:%28206%29%20667-2793> >> >> <tel:%28206%29%20667-2793> >> >> >> >> >> >> >> >> >> >> -- >> >> Dr. Martin Morgan, PhD >> >> >> >> Fred Hutchinson Cancer Research Center >> >> 1100 Fairview Ave. N. >> >> PO Box 19024 Seattle, WA 98109 >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________**___________________ >> >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.**org< >> Bioc-devel@r-project.org> >> >> > >> >> mailing list >> >> https://stat.ethz.ch/mailman/_**_listinfo/bioc-devel< >> https://stat.ethz.ch/mailman/__listinfo/bioc-devel> >> >> >> >> <https://stat.ethz.ch/mailman/**listinfo/bioc-devel< >> https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> >> >> > >> >> >> >> >> >> -- >> >> Hervé Pagès >> >> >> >> Program in Computational Biology >> >> Division of Public Health Sciences >> >> >> >> Fred Hutchinson Cancer Research Center >> >> 1100 Fairview Ave. N, M1-B514 >> >> P.O. Box 19024 >> >> Seattle, WA 98109-1024 >> >> >> >> E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org> >> >> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> >> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> >> >> >> >> > -- >> > Hervé Pagès >> > >> > Program in Computational Biology >> > Division of Public Health Sciences >> > Fred Hutchinson Cancer Research Center >> > 1100 Fairview Ave. N, M1-B514 >> > P.O. Box 19024 >> > Seattle, WA 98109-1024 >> > >> > E-mail: hpa...@fhcrc.org >> > Phone: (206) 667-5791 >> > Fax: (206) 667-1319 >> > >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel