Hi Steffen, On Thu, Jun 04, 2020 at 01:49:33AM +0200, Steffen Möller wrote: > Jun created this table > https://docs.google.com/spreadsheets/d/1tApLhVqxRZ2VOuMH_aPUgFENQJfbLlB_PFH_Ah_q7hM/edit?usp=sharing > that lists a set of workflows and its dependencies.
I admit this kind of list was always what I was seeking for. Its providing me as a non-user with a great todo list and over the last three weeks I had my daily pick from it and have filled new queue with the missings in Debian. I'm also trying to constantly update our according UDD query[1]. Please note that sometimes Debian packages have different names (for example HTSfilter --> r-bioc-htsfilter is available in Debian). This brings up the question: The UDD query is re-sorting this list alphabethically (=different from the original document) and according to the Debian package names (=even more different from the original document). My question is: Would it be more helpful if the query would conserve the original names and sorting? (Remark: I previously asked some questions to the said document here but never got any answer to my questions. It would help if you, namely Jun and Steffen who are actively working with this document, would mind answering since I'm doing this for **your** comfort.) I also noticed that some pretty generic software is listed there for instance gzip_reader and this actual one for no good reason since we have libgzstream-dev and gzip_reader[2] is just wrapping some example usage around. That's neither a sensible package nor anything I would like to see on our todo list. If it is really needed in some package that could be added as a patch or so. > Some are trickier to > package than others, but if I read this right, then artic, scrnaseq and > smartseq2 just wait for nextflow and pigx-rnaseq waits for tests to work > :o/ Shovill just needs a package for itself. Yes, we have some stuff in Salsa that needs more work and some is not even on Salsa since it was never on any packaging request list. I keep on working down this (nicely and productively growing) list to get at least everything into salsa and make our covid-19 task reflecting the list with valuable information. > Some dependencies that we are missing are also not distributed with > Conda. A weird example is the pip package "capsule" as a dependency of > nextflow. Conda however distributes nextflow, so ... what are they doing? As I said in the t-coffee example: The build time test is broken for years and we were finally fixing it. (Probably just the test was broken, not t-coffee itself - but who knows this?) So I assume conda is not doing the provided test in the first place and we should keep on with our effort to do serious testing at build time and in CI. > I started to really like that spreadsheet - have also added bcbio to the > list of pending workflow engines. I don't really know where this > spreadsheet could go. It is useful for us now, but I wonder what kind of > questions it can help answering For me its a very helpful todo list. Translated into our UDD query its a pretty nice tool. I just wonder how we can make the translation of the list to the query a bit more reliable - that's why my question above since may be if there would be matching lines and names it would be easier to keep both in sync. > - is it a similarity score for workflows > implementations? A flexibility score for workflows? An indication for > packages to be substituteable? Something to rank significance for all > those packages left uncited in scientific publications? Once all the > packages are in Debian, we can auto-create that matrix. But now? Any > idea on the bio.tools front about this? May be you put those in CC where you expect any answer from. I personally can not answer this. Kind regards Andreas. [1] https://salsa.debian.org/blends-team/med/-/blob/master/covid-19_doc/bio_covid-19_dependencies_query [2] https://github.com/gatoravi/gzip_reader -- http://fam-tille.de