Same view here, with all the different data types that are around in the Bioconductor world these days it seems to me that a consistent behaviour is preferable. Florian
On 9/24/13 8:22 AM, "Hervé Pagès" <hpa...@fhcrc.org> wrote: >Hi Florian, Marc, > >On 09/18/2013 11:55 PM, Hahne, Florian wrote: >> Hi Marc, Herve, >> I also noticed this behaviour: >> >> library(TxDb.Hsapiens.UCSC.hg19.knownGene) >> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >> seqlevels(txdb, force=TRUE) <- c("chr1", "ch2") >> oldLevs <- seqlevels(txdb) >> seqlevels(txdb, force=TRUE) <- "chr1" >> >> seqlevels(txdb, force=TRUE) <- oldLevs >> Error in .seqinfo.TranscriptDbReplace(x, new2old = new2old, force = >>force, >> : >> The replacement value must be either a 1 to 1 replacement or a >>subset of >> the original set when replacing the 'seqinfo' of a TranscriptDb object >> >> >> But: >> >> restoreSeqlevels(txdb) >> seqlevels(txdb, force=TRUE) <- oldLevs >> >> >> This is probably intentional, but I found it to be rather confusing. If >> one should be able to use the seqlevels replacement method to control >> active and inactive chromosomes in the TranscriptDb object, then having >> this mandatory step of restoring all chromosomes to the active state >> followed by another restriction to be quite cumbersome. At least a >> somewhat more useful error message would help in that case. Or couldn't >> the replacement method always call restoreSeqlevels internally. > >Somewhat related to this, I also found the following issue: > > > library(TxDb.Hsapiens.UCSC.hg19.knownGene) > > txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene > > seqlevels(txdb, force=TRUE) <- "chrM" > > seqlengths(txdb) > chrM > 16571 > > seqlevels(txdb) <- "chr1" > > seqlengths(txdb) > chr1 > 16571 > >The 2nd call to the seqlevels() setter actually performed a >*renaming* operation (a clue for this is that I didn't have to >use force=TRUE). Renaming the seqlevels of a TranscriptDb object >is something I think we want to support, e.g. when there is the >need to use a different naming convention (like in >seqlevels(txdb) <- "M"). This is exactly the syntax that one >would use to rename the seqlevels of a GRanges object: > > > gr <- GRanges("chrM", IRanges(1:2, 10)) > > seqlevels(gr) <- "chr1" > > gr > GRanges with 2 ranges and 0 metadata columns: > seqnames ranges strand > <Rle> <IRanges> <Rle> > [1] chr1 [1, 10] * > [2] chr1 [2, 10] * > --- > seqlengths: > chr1 > NA > >There is nothing that prevents the user from doing this kind of silly >renaming except that, in the case of the TranscriptDb object, this is >almost certainly not what the user intended to do. What s/he really >wanted was selecting chr1 instead of chrM, which can be achieved with: > > > restoreSeqlevels(txdb) > > seqlevels(txdb, force=TRUE) <- "chr1" > > seqlengths(txdb) > chr1 > 249250621 > >Yes having to restore all chromosomes to the active state before one >can make a new selection is cumbersome so we probably need to revisit >this. > >> >> It would also help to point out somewhere (I may have missed it) that >> TranscriptDb seqlevels (or rather the internal GRanges) are actually >>pass >> by reference: >> >> txdb2 <- txdb >> seqlevels(txdb) >> seqlevels(txdb2) >> >> restoreSeqlevels(txdb) >> seqlevels(txdb2) > >That's because TranscriptDb is a reference class. IMHO it shouldn't. > >> >> Also Herve: what is the definition of a "used" seqlevel? Does that mean >> that a factor level is defined, but no range in the GRanges object is >> assigned this level? > >Yes. Like here: > > > gr <- GRanges("chrM", IRanges(1:2, 10), seqlengths=c(chr1=5000, >chrM=25)) > > gr > GRanges with 2 ranges and 0 metadata columns: > seqnames ranges strand > <Rle> <IRanges> <Rle> > [1] chrM [1, 10] * > [2] chrM [2, 10] * > --- > seqlengths: > chr1 chrM > 5000 25 > >Only chrM is in use: > > > seqlevelsInUse(gr) > [1] "chrM" > >The idiom to drop seqlevels that are not in use is: > > > seqlevels(gr) <- seqlevelsInUse(gr) > > gr > GRanges with 2 ranges and 0 metadata columns: > seqnames ranges strand > <Rle> <IRanges> <Rle> > [1] chrM [1, 10] * > [2] chrM [2, 10] * > --- > seqlengths: > chrM > 25 > >Of course, in that case, force=TRUE is not needed because the operation >is guaranteed to not "shrink" the GRanges object. > >> Why would those exists in a TranscriptDb object at >> all? > >The notion of "seqlevels in use" is not formally defined for a >TranscriptDb object: > > > seqlevelsInUse(txdb) > Error in (function (classes, fdef, mtable) : > unable to find an inherited method for function ŒseqlevelsInUse¹ >for signature Œ"TranscriptDb"¹ > >but it could be: a seqlevel or chromosome could be considered to be >in use if there is at least 1 feature (transcript, exon or CDS) in >the db that is on it. I think for most TranscriptDb objects, all the >seqlevels are actually in use. However it's conceivable that a >TranscriptDb object could have some extra seqlevels that are not in >use (the seqlevels + their lengths + their circularity flags are stored >in a separate table, the 'chrominfo' table). > >Anyway, when I added the seqlevelsInUse() generic a few months ago, >I didn't write a method for TranscriptDb objects because I couldn't >think of a use case for it. And also because without any change to >the current db schema, this would be a costly operation as it would >need to scan the entire database, which cannot be done with a single >query. > >Now with the seqlevels() setter allowing to reduce the seqlevels >of a TranscriptDb object, we face the following choices: (a) >implement the seqlevelsInUse,TranscriptDb method and make the >'force' arg of the seqlevels() setter behave consistently with >that, or (b) not implement the seqlevelsInUse,TranscriptDb method >and make the 'force' arg meaningless. As I said earlier, my preference >goes for (b). > >H. > >> >> Florian >> >> >> >> >> >> On 9/18/13 9:08 PM, "Hervé Pagès" <hpa...@fhcrc.org> wrote: >> >>> Note that currently it's kind of inconsistent anyway because it doesn't >>> look at the seqlevels that are in use. For example: >>> >>> library(TxDb.Hsapiens.UCSC.hg19.knownGene) >>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >>> >>> Trying to drop chrUn_gl000249 (which I know is not in use): >>> >>> > seqlevels(txdb) <- setdiff(seqlevels(txdb), "chrUn_gl000249") >>> Error in .seqinfo.TranscriptDbReplace(x, new2old = new2old, force = >>> force, : >>> You need to use force=TRUE if you want to drop seqlevels. >>> >>> 'force=TRUE' should only be required when trying to drop seqlevels >>> that are in use. >>> >>> So the choice are more between: (a) consistent, (b) convenient, >>> or (c) leave it as it is (which is neither convenient or consistent). >>> >>> I vote for (b). >>> >>> H. >>> >>> >>> On 09/18/2013 11:31 AM, Marc Carlson wrote: >>>> I actually considered this, but I opted to do it this way just for the >>>> sake of being consistent (which was my whole mission for implementing >>>> seqlevels in here in the 1st place). Now I could make it more >>>> convenient here and break consistency with how it is used elsewhere, >>>>but >>>> what do people prefer? >>>> >>>> Consistent or convenient? >>>> >>>> >>>> Marc >>>> >>>> >>>> >>>> On 09/18/2013 10:40 AM, Hervé Pagès wrote: >>>>> Hi Marc, >>>>> >>>>> Wouldn't it make sense to just ignore the 'force' arg when >>>>> dropping the seqlevels of a TranscriptDb? >>>>> >>>>> The 'force' argument is FALSE by default and this prevents >>>>> seqlevels<- to shrink GRanges or other vector-like objects >>>>> when the user tries to drop seqlevels that are in use. >>>>> Internally seqlevels<- calls seqlevelsInUse() to get the >>>>> seqlevels currently in use and see if they intersect with >>>>> the seqlevels to drop. >>>>> >>>>> In the TranscriptDb situation, people always have to use >>>>> 'force=TRUE' to drop seqlevels, regardless of whether the >>>>> levels to drop are in use or not (the seqlevelsInUse() >>>>> getter not being defined for TranscriptDb objects, I suspect >>>>> seqlevels<- doesn't look at this). >>>>> >>>>> So maybe 'force' could just be ignored for TranscriptDb objects? >>>>> That would make seqlevels<- a little bit more user-friendly on >>>>> those objects. >>>>> >>>>> Thanks, >>>>> H. >>>>> >>>>> >>>>> On 09/13/2013 10:38 AM, Marc Carlson wrote: >>>>>> Hi Florian, >>>>>> >>>>>> Yes we are trying to make things more uniform. seqlevels() lets you >>>>>> rename as well as deactivate chromosomes you want to ignore, so it >>>>>>was >>>>>> really redundant with isActiveSeq(). So we are moving away from >>>>>> isActiveSeq() just so that users have less to learn about. The >>>>>>reason >>>>>> why isActiveSeq was different from seqlevels was just because it was >>>>>> born for a TranscriptDb (which is based on an annotation database) >>>>>> instead of being born on a GRanges object. So seqlevels was the >>>>>>more >>>>>> general tool. >>>>>> >>>>>> Marc >>>>>> >>>>>> >>>>>> >>>>>> On 09/13/2013 07:24 AM, Hahne, Florian wrote: >>>>>>> Hi Marc, >>>>>>> I saw these warnings in Gviz, but they stem from GenomicFeatures >>>>>>> >>>>>>> Warning messages: >>>>>>> 1: 'isActiveSeq' is deprecated. >>>>>>> Use 'seqlevels' instead. >>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated"). >>>>>>> 2: 'isActiveSeq' is deprecated. >>>>>>> Use 'seqlevels' instead. >>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated"). >>>>>>> 3: 'isActiveSeq<-' is deprecated. >>>>>>> Use 'seqlevels' instead. >>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated"). >>>>>>> 4: 'isActiveSeq<-' is deprecated. >>>>>>> Use 'seqlevels' instead. >>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated"). >>>>>>> 5: 'isActiveSeq' is deprecated. >>>>>>> Use 'seqlevels' instead. >>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated"). >>>>>>> 6: 'isActiveSeq<-' is deprecated. >>>>>>> Use 'seqlevels' instead. >>>>>>> See help("Deprecated") and help("GenomicFeatures-deprecated"). >>>>>>> >>>>>>> So has the whole idea of active chromosomes in the data base been >>>>>>> dropped? I could not find anything in the change notes. Do I get it >>>>>>> right that you can now do >>>>>>> seqlevels(txdb, force=TRUE) <- "chr1" >>>>>>> if you just want the first chromosome to be active? >>>>>>> >>>>>>> Florian >>>>>>> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioc-devel@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>>>> >>>>> >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpa...@fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >> > >-- >Hervé Pagès > >Program in Computational Biology >Division of Public Health Sciences >Fred Hutchinson Cancer Research Center >1100 Fairview Ave. N, M1-B514 >P.O. Box 19024 >Seattle, WA 98109-1024 > >E-mail: hpa...@fhcrc.org >Phone: (206) 667-5791 >Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel