Greetings, So, following the discussion on the chat on Jitsi[0], I'm trying to summarize what I'm seeing about VirusSeeker. There have been good share of guesswork, since I've no medical background, and especially before I hit the documentation. So take it with a grain (or more like a bag) of salt. ;)
[0] https://salsa.debian.org/med-team/community/2020-covid19-hackathon/-/blob/master/jitsi/20190409_1700_jitsi.log (TL;DR: jump to "In conclusion"...) The download[1] section gathers four packages, which currently are: * VirusSeeker_Virome_pipeline_v0.063_database20160824.tgz * VirusSeeker_Discovery_pipeline_v0.03_DB20160824.tgz * VirusSeeker_pipeline_and_documentation.zip * I1164_12629_Harvard_SIV_196_06_2_24_12_rawData.tgz [1] https://wupathlabs.wustl.edu/virusseeker/download/ The *Virome* package seems to be the core of VirusSeeker for virome composition analysis. I was not sure of the purpose of the *Discovery* archive, but after reaching the documentation, it seemed to be the same thing but tuned for doing the task of virus discovery. There is a lot of duplicate perl code between the two archives. But in both cases, I suppose dependencies[2] described on the website would apply. [2] https://wupathlabs.wustl.edu/virusseeker/installation/install-prerequisite-software/ The *pipeline and documentation* might serve as a basis for -doc packages I suppose. The documentation mostly consists in a set of HTML pages. I have been very worried by the presence of .docx files in the sample outputs at first, but they actually seem to just be .txt file exports to MS Word 2007+, said .txt being present next to the .docx. There is a mention to a VirusHunter software in index of the documentation, but the link is pointing to an HTTP 404 error code, so I guess it might need a refresh. (Interestingly, or not, I found out this documentation archive also embeded copies matching bit for bit of the two previous packages.) The *rawData* archive contains FASTQ, gz compressed, sample data, which might be of use for testing and sample data I guess, although they are weighting some 500M. I'm not sure how this is a concern for build and testing processes. On a side note, there is a page[3] referring to all sorts of different locations for getting NCBI NT/NR/taxonomy and virus nucleotide and protein databases. This is also in this chapter that the configuration of VirusSeeker is described. For the moment, configuring consists in hopping into Perl scripts and setting paths in perl variables accordingly. I don't know if those databases would require additional packaging (if even allowed), or some kind of mecanism to pull it on the system. But I'm under the impression that they might be necessary for the software to be useful. They are located in a whole set of different locations. [3] https://wupathlabs.wustl.edu/virusseeker/installation/install-databases/ Finally, to be noted, in addition to the dependencies page, there is a System Requirements[4] page which explains the need for a clustered infrastructure, and VirusSeeker Install page[5] explaining how to set appropriately the content of the various perl scripts to make them use a distributed batch job infrastructure; the reference seemingly being Slurm WLM here. [4] https://wupathlabs.wustl.edu/virusseeker/system-requirements/ [5] https://wupathlabs.wustl.edu/virusseeker/installation/install-virusseeker/ In conclusion, my impression is that: 1. in light of my earlier mail[6], Virome and Discovery are pending dependencies inclusion into Debian, at least "prinseq-lite", and an hypothetical "libstatistics-pac-perl" (I began to have a look at these two packages FWIW); 2. I would believe that Pipeline and Documentation might be used for producing documentation already; 3. and also the rawData may be used for some heavyweight sample package I guess, if it makes sense; 4. on a side note I'm under the impression that there is some work to get the configurability up to Debian standards; setting variables into perl scripts may not be well handled by debian/config. [6] https://lists.debian.org/debian-med/2020/04/msg00121.html Did I manage to make it sound like a plan ? :) Kind Regards, -- Étienne Mollier <etienne.moll...@mailoo.org> Fingerprint: 5ab1 4edf 63bb ccff 8b54 2fa9 59da 56fe fff3 882d Help find cures against the Covid-19 ! Give CPU cycles: * Rosetta@home: https://boinc.bakerlab.org/rosetta/ * Folding@home: https://foldingathome.org/
signature.asc
Description: PGP signature