Michael
On Tue, May 21, 2013 at 2:34 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:
On 05/21/2013 08:00 AM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 10:36 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:
Michael,
On 05/20/2013 08:44 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 4:13 PM, Hervé Pagès
<hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>>> wrote:
Hi,
On 05/20/2013 03:15 PM, Michael Lawrence wrote:
On Mon, May 20, 2013 at 3:11 PM, Martin Morgan
<mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>>
<mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
<mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>>> wrote:
Hi Michael --
On 5/20/2013 5:56 AM, Michael Lawrence wrote:
While it's cool that seqlevels<- has
become so
flexible,
I still claim
that
rename/keep/drop would be a lot easier
to read,
because
they describe the
high-level operation, and there's no
reason to
deprecate
them. They're
also
easier to remember. For example, for
dropping with
seqlevels<-, the user
needs
to remember that "force" is necessary
to drop. Much
easier to just say
"dropSeqlevels(), please". And
reimplementing
keepSeqlevels is still too
complicated. Is it such a maintenance
burden to
have
these simple
wrappers sit
on top of the low-level manipulators?
Another suggestion: renameSeqlevels
should not
require a
named vector (in
cases
where it is unnamed, require that it is
parallel to the
existing
seqlevels, as
with seqlevels<-).
I didn't really indicate what drove my
desire to see
keepSeqlevels
deprecated. keepSeqlevels, seqlevels<-, and
isActiveSeq were
developed more
or less independently, and have different
contracts. The
different
contracts are confusing to the user, as is
creating
a usable
help system
when there are multiple end points for
what is a common
operation. The help
pages of each were inadequate in different
ways.
And because
the code was
developed independently, support for different
objects was
inconsistent. So
actually, yes, the maintenance (and use)
burden for the
previous state of
affairs was substantial.
On the other hand, I agree that
keepSeqlevels is
convenient
as a simple
wrapper around seqlevels<-, in the same
way that
setNames
and names<- are
both useful.
So we could iterate to
keepSeqlevels <-
function(x, value, ...)
{
seqlevels(x, force=TRUE) <- value
x
}
but I'd be less enthusiastic about
maintaining the
original
contract of
keepSeqlevels, where keepSeqlevels(gr1,
gr2), would
have
worked for two
GRanges objects.
Why would this be called keepSeqlevels() if,
depending on
how it's used,
it will either add, drop, rename, or permute the
seqlevels?
Couldn't this be called setSeqlevels?
I thought that new2old was necessary for permuting.
The seqlevels setter has no 'new2old' arg.
You're right that
adding should be disallowed for keepSeqlevels(). Adding
seqlevels is not
a common operation. The two common operations are:
* Permuting, usually because the data were imported in
non-canonical
order (seqnameStyle was designed to address this, no?).
Permuting is straightforward with seqlevels<-:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3"
> seqlevels(gr) <- rev(seqlevels(gr))
> seqlevels(gr)
[1] "chr3" "chr2" "chr1"
* Subsetting, either by keeping or dropping.
Also straightforward:
> seqlevels(gr, force=TRUE) <- "chr2"
> seqlevels(gr)
[1] "chr2"
Two main reasons: taking a
small slice of the data for prototyping, or removing
problematic
chromosomes (sex, circular). This goes back to bsapply
and the
exclude
argument.
There are other important use cases:
- Dropping seqlevels that are *not* in use. This happens
for example
when subsetting a BAM file with Rsamtools::filterBam
to keep only
the alignments located on a given chromosome. The
entire sequence
dictionary (in the header) is propagated to the
resulting BAM file
so when you read the file with readGAlignments() you
end up with a
bunch of useless seqlevels that you generally start by
dropping.
You don't want to drop any alignment so force=FALSE is
your friend.
This is sort of tangential to the discussion, but do you really
want to
do this? I would preserve the universe as given by the BAM.
- An even more common operation is renaming: 90% of the
times that's
what I use seqlevels<- for, and that's what I tell
people to use
when they need to rename their chromosomes. Also
straightforward:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3" "chrM"
> seqlevels(gr) <- sub("^chr", "", seqlevels(gr))
> seqlevels(gr)
[1] "1" "2" "3" "M"
> seqlevels(gr)[seqlevels(gr) == "M"] <- "MT"
> seqlevels(gr)
[1] "1" "2" "3" "MT"
Note that this is simpler to use than
renameSeqlevels() which
always required you to build the entire right value as
a named
vector.
Sure, renaming is a common use case. Not sure how I forgot that.
As I wrote earlier in the thread, renameSeqlevels should be
changed so
as not to require naming of the vector.
So the added-value of keepSeqlevels() seems to boil down to:
(a) it always uses force=TRUE
(b) it preserves the original object and returns the
modified object
If you want to restrict the use of keepSeqlevels() to
permuting and
dropping, you'll also have to disallow renaming (in
addition to adding).
Then its name will more or less reflect what it does (if
"keep" means
"permute" or "drop"). The final result will be something
that does less
than setSeqlevels() and that is not necessarily easier to
read: both
will set on the object whatever vector of seqlevels they
are supplied,
except that one will fail when doing so would actually
result in adding
or renaming seqlevels.
So it looks like seqlevels<- is pretty powerful. Now I see that if I
scroll down to the examples in the man page, I find the actual
documentation. It's cool to learn that we can replace seqlevels
on TxDb
objects.
Has been working on TranscriptDb objects for more than
My argument has always been that since these low-level
operations are so powerful, it's nice to have high-level
operations to
clarify the code. We have to think hard to know what the RHS
will do in
that replacement.
When you do something like seqlevels(gr) <- new_seqlevels, you're saying
"can you please replace the current seqlevels by those new ones".
That's all. The request/contract is simple and all the complexity is
taken care of internally by seqlevels<-. It might be that, depending on
what's currently on 'gr', this will actually result in dropping, adding,
permuting. Or *a combination of the 3*. Like here:
> seqlevels(gr)
[1] "chr1" "chr2" "chr3"
> seqlevels(gr) <- c("chr4", "chr3", "chr1")
> seqlevels(gr)
[1] "chr4" "chr3" "chr1"
# one seqlevel was drop, one was added, and the 2 seqlevels that were
# preserved were reordered.
When you want to subset a vector to keep all elements with an even
index you do x[c(FALSE, TRUE)]. Is that keeping or dropping elements?
Do we need to have a subsetting operator for keeping elements, another
one for dropping elements, and another one for permuting them (like
in x[sample(length(x))])?
H.
It's probably one of the most complex replacement
operations; far more complex than the typical one, including
levels<- on
factor. There's nothing wrong with that; it's just complexity
that we
should be able to hide.
Michael
H.
Michael
H.
Well, I wasn't even aware of that feature, so
you've
made your
point about
the documentation ;) Sounds like a good
compromise to me.
Thanks for understanding,
Michael
Martin
Michael
On Sat, May 18, 2013 at 6:00 PM,
Martin Morgan
<mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org> <mailto:mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>>
<mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
<mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>>
<mailto:mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>
<mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>
<mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>
<mailto:mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>>>>
wrote:
On 05/18/2013 05:42 PM, Martin
Morgan wrote:
Some of the most common
operations are
straight-forward, e.g.,
> gr =
GRanges(paste0("chr",
c(1:22,
"X", "Y")), IRanges(1,
100))
> seqlevels(gr) =
sub("chr", "ch",
seqlevels(gr))
To get a more comprehensive
example I
should have
followed my own
advice and
grabbed from the help page!
## Rename:
seqlevels(txdb) <-
sub("chr", "",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb) <-
paste0("CH",
seqlevels(txdb))
seqlevels(txdb)
seqlevels(txdb)[seqlevels(__**______txdb) ==
"CHM"] <- "M"
seqlevels(txdb)
--
Computational Biology / Fred
Hutchinson
Cancer
Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
<tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793> <tel:%28206%29%20667-2793>
<tel:%28206%29%20667-2793>
--
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
[[alternative HTML version deleted]]
_____________________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
<mailto:Bioc-devel@r-project.__org
<mailto:Bioc-devel@r-project.org>>
<mailto:Bioc-devel@r-project.
<mailto:Bioc-devel@r-project.>____org
<mailto:Bioc-devel@r-project.__org
<mailto:Bioc-devel@r-project.org>>>
mailing list
https://stat.ethz.ch/mailman/______listinfo/bioc-devel
<https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
<https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>