hi Michael, thanks for sharing your opinion, comments below,

On 01/28/2015 06:22 PM, Michael Lawrence wrote:
[...]

Is your concern here scalability, ease of use, or what? If scalability,
we should probably start thinking about a more efficient representation
for repeated vectors, kind of like Rle, except for rep(,each=FALSE). It
would just %% the index. I think this would be generally useful and so
may be of more value than a more complex VRanges. After all, it is the
(totally justifiable) complexity of VCF that motivated VRanges in the
first place.

i'm concerned about the scalability with multisample VCFs when adding annotations. What you propose about using Rle-like vectors to store identical values from different samples together sounds good to me and I'm also in favor of keeping data structures as simple as possible. Maybe for the time being I'll try to use 'VRanges' just as they are now and I'll try to explore how bad it gets when scaling in samples and annotations to justify doing something about it along the lines you suggest.

[...]

I am not sure if coercion via as() would make sense here, since there is
no obvious reason why the split would be by sample. Why not just use
split(vr, sampleNames(vr))? That should work already.

i see your point in that the splitting a VRanges could be motivated by something else than sample and as you suggest 'split()' does the work very fast. actually invoking to the VRangesList constructor i get what i was looking for:

do.call("VRangesList", split(vr, sampleNames(vr)))
VRangesList of length 3
names(3): sample1 sample2 sample3


although i realize now that the rle-like strategy you propose then would not be usable when splitting by sample.

cheers,

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to