Re: [Bioc-devel] VRanges with multiple samples

Robert Castelo Thu, 29 Jan 2015 09:38:33 -0800

hi Michael, thanks for sharing your opinion, comments below,


On 01/28/2015 06:22 PM, Michael Lawrence wrote:
[...]

Is your concern here scalability, ease of use, or what? If scalability,
we should probably start thinking about a more efficient representation
for repeated vectors, kind of like Rle, except for rep(,each=FALSE). It
would just %% the index. I think this would be generally useful and so
may be of more value than a more complex VRanges. After all, it is the
(totally justifiable) complexity of VCF that motivated VRanges in the
first place.

i'm concerned about the scalability with multisample VCFs when addingannotations. What you propose about using Rle-like vectors to storeidentical values from different samples together sounds good to me andI'm also in favor of keeping data structures as simple as possible.Maybe for the time being I'll try to use 'VRanges' just as they are nowand I'll try to explore how bad it gets when scaling in samples andannotations to justify doing something about it along the lines you suggest.


[...]

I am not sure if coercion via as() would make sense here, since there is
no obvious reason why the split would be by sample. Why not just use
split(vr, sampleNames(vr))? That should work already.

i see your point in that the splitting a VRanges could be motivated bysomething else than sample and as you suggest 'split()' does the workvery fast. actually invoking to the VRangesList constructor i get what iwas looking for:


do.call("VRangesList", split(vr, sampleNames(vr)))
VRangesList of length 3
names(3): sample1 sample2 sample3

although i realize now that the rle-like strategy you propose then wouldnot be usable when splitting by sample.


cheers,

robert.

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] VRanges with multiple samples

Reply via email to