On 5/23/21 5:56 PM, Steffen Möller wrote: > > Am 23.05.21 um 00:02 schrieb Nilesh Patra: >> >> On 5/23/21 2:54 AM, Andreas Tille wrote: >>> On Sat, May 22, 2021 at 09:10:46AM +0200, Andreas Tille wrote: >>>> On Fri, May 21, 2021 at 09:26:48PM +0200, Steffen Möller wrote: >>>>> If someone needs a stimulus to package something - cuteSV >>>>> (https://github.com/tjiangHIT/cuteSV), please. >>>> I gave it a kickstart while sitting in the train (which will be >>>> offline soon). Everybody can feel free to add own ID to Uploaders >>>> and finalise. There is no build time test running now and no >>>> autopkgtest. Data to test / benchmark are included - so this >>>> should be feasible. >>> I just packaged the precondition python3-cigar and uploaded to new. >> I wrote a sample autopkgtest for cigar (basically used the same thingy in >> the readme) >> and did a few minor changes. >> >> I have no idea about autopkgtests for cutesv - I lack the pre-requistites >> here and probably only Steffen can help here. >> >> PS: Please check and upload vbz-compression whenever you have time (after >> two days as you wrote would be fine anyway) >> I'll be inactive/be away for a couple of days (wish to take a break :-)) > > Thank you both, you are amazing! > > CuteSV is part of the > https://github.com/nanoporetech/pipeline-structural-variation that I > plan to run when first Nanopore reads surface in my inbox next week. You > compare against a reference genome to run this, which we do not have in > Debian, so, yes, we should think of some tests, but we should also find > a way to perform such tests for other packages. > > This kind of leads to a follow-up question - we could have a "test > package" that offers a fraction of the human genome, like the Y > chromosome and a second - chromosome 22 maybe. That would not be too big > and we can test with it. It would also be a bit meaningless, though. And > for testing we do not need anything to be human (or real) in the first > place. We could generate our own mini-genome or instead (which I would > prefer) go for something small that is real, like yeast (for > eukaryotes), E. coli (for bacteria), we ignore archea, and then .. there > is https://www.ncbi.nlm.nih.gov/nuccore/CP014940 , i.e. that data fr C. > Venter's > https://www.jcvi.org/research/first-minimal-synthetic-bacterial-cell, > which may be interesting to be distributed with an Open Source > distribution.
Sounds good, but please take these factors in consideration: * The debci machines have typically have space of ~40 GiB. If the data you refer here is even a few GiB, all packages using it for tests will turn into RC bugs * If the size of data is _not_ in line of an RC bug, but still *huge* - and used in large number of tests, it'll be a pain for us to maintain it ourselves and also not the best for end users who might want to download test data I had listed more reasons in a previous mail when a discussion regarding "centralised test data" was going on, please take a look here too: https://lists.debian.org/debian-med/2020/09/msg00365.html > While there is always something novel found also for these genomes for > which the genomic DNA is long known, we do not much harm by distributing > such genomes. Professional researchers will update them, anyway. The > same holds for the human genome, but it is a bit larger and we should > possibly make our experiences with the smaller genomes, first. If smaller genome sizes, analysis of which renders output sequences which aren't too large in size, it can be done. > I'll let this think in for another while and then likely extend getData > to deal with these genomes and auto-generate native Debian packages with it. > > Ok - back to some real work and I'll have a closer look at that pipeline. * Thumbs up * Nilesh
signature.asc
Description: OpenPGP digital signature