Re: CuteSV (Was: PyEnsembl - how does that help us?)

Steffen Möller Sun, 23 May 2021 05:27:19 -0700


Am 23.05.21 um 00:02 schrieb Nilesh Patra:
>
> On 5/23/21 2:54 AM, Andreas Tille wrote:
>> On Sat, May 22, 2021 at 09:10:46AM +0200, Andreas Tille wrote:
>>> On Fri, May 21, 2021 at 09:26:48PM +0200, Steffen Möller wrote:
>>>> If someone needs a stimulus to package something - cuteSV
>>>> (https://github.com/tjiangHIT/cuteSV), please.
>>> I gave it a kickstart while sitting in the train (which will be
>>> offline soon).  Everybody can feel free to add own ID to Uploaders
>>> and finalise.  There is no build time test running now and no
>>> autopkgtest.  Data to test / benchmark are included - so this
>>> should be feasible.
>> I just packaged the precondition python3-cigar and uploaded to new.
> I wrote a sample autopkgtest for cigar (basically used the same thingy in the 
> readme)
> and did a few minor changes.
>
> I have no idea about autopkgtests for cutesv - I lack the pre-requistites 
> here and probably only Steffen can help here.
>
> PS: Please check and upload vbz-compression whenever you have time (after two 
> days as you wrote would be fine anyway)
> I'll be inactive/be away for a couple of days (wish to take a break :-))


Thank you both, you are amazing!

CuteSV is part of the
https://github.com/nanoporetech/pipeline-structural-variation that I
plan to run when first Nanopore reads surface in my inbox next week. You
compare against a reference genome to run this, which we do not have in
Debian, so, yes, we should think of some tests, but we should also find
a way to perform such tests for other packages.

This kind of leads to a follow-up question - we could have a "test
package" that offers a fraction of the human genome, like the Y
chromosome and a second - chromosome 22 maybe. That would not be too big
and we can test with it. It would also be a bit meaningless, though. And
for testing we do not need anything to be human (or real) in the first
place. We could generate our own mini-genome or instead (which I would
prefer) go for something small that is real, like yeast (for
eukaryotes), E. coli (for bacteria), we ignore archea, and then .. there
is https://www.ncbi.nlm.nih.gov/nuccore/CP014940 , i.e. that data fr C.
Venter's
https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell,
which may be interesting to be distributed with an Open Source
distribution.

While there is always something novel found also for these genomes for
which the genomic DNA is long known, we do not much harm by distributing
such genomes. Professional researchers will update them, anyway. The
same holds for the human genome, but it is a bit larger and we should
possibly make our experiences with the smaller genomes, first.

I'll let this think in for another while and then likely extend getData
to deal with these genomes and auto-generate native Debian packages with it.

Ok - back to some real work and I'll have a closer look at that pipeline.

Best,
Steffen

Re: CuteSV (Was: PyEnsembl - how does that help us?)

Reply via email to