Hi Steffen,

On Thu, Jun 04, 2020 at 01:49:33AM +0200, Steffen Möller wrote:
> Jun created this table
> https://docs.google.com/spreadsheets/d/1tApLhVqxRZ2VOuMH_aPUgFENQJfbLlB_PFH_Ah_q7hM/edit?usp=sharing
> that lists a set of workflows and its dependencies.

I admit this kind of list was always what I was seeking for.  Its
providing me as a non-user with a great todo list and over the last
three weeks I had my daily pick from it and have filled new queue with
the missings in Debian.  I'm also trying to constantly update our
according UDD query[1].  Please note that sometimes Debian packages have
different names (for example HTSfilter --> r-bioc-htsfilter is available
in Debian).

This brings up the question:  The UDD query is re-sorting this list
alphabethically (=different from the original document) and according
to the Debian package names (=even more different from the original
document).  My question is:  Would it be more helpful if the query
would conserve the original names and sorting?
(Remark: I previously asked some questions to the said document here
but never got any answer to my questions.  It would help if you, namely
Jun and Steffen who are actively working with this document, would mind
answering since I'm doing this for **your** comfort.)

I also noticed that some pretty generic software is listed there for
instance gzip_reader and this actual one for no good reason since we
have libgzstream-dev and gzip_reader[2] is just wrapping some example
usage around.  That's neither a sensible package nor anything I would
like to see on our todo list.  If it is really needed in some package
that could be added as a patch or so.

> Some are trickier to
> package than others, but if I read this right, then artic, scrnaseq and
> smartseq2 just wait for nextflow and pigx-rnaseq waits for tests to work
> :o/ Shovill just needs a package for itself.

Yes, we have some stuff in Salsa that needs more work and some is not
even on Salsa since it was never on any packaging request list.  I keep
on working down this (nicely and productively growing) list to get at
least everything into salsa and make our covid-19 task reflecting the
list with valuable information.
 
> Some dependencies that we are missing are also not distributed with
> Conda. A weird example is the pip package "capsule" as a dependency of
> nextflow. Conda however distributes nextflow, so ... what are they doing?

As I said in the t-coffee example:  The build time test is broken for
years and we were finally fixing it.  (Probably just the test was
broken, not t-coffee itself - but who knows this?)  So I assume conda
is not doing the provided test in the first place and we should keep
on with our effort to do serious testing at build time and in CI.
 
> I started to really like that spreadsheet - have also added bcbio to the
> list of pending workflow engines. I don't really know where this
> spreadsheet could go. It is useful for us now, but I wonder what kind of
> questions it can help answering

For me its a very helpful todo list.  Translated into our UDD query
its a pretty nice tool.  I just wonder how we can make the translation
of the list to the query a bit more reliable - that's why my question
above since may be if there would be matching lines and names it would
be easier to keep both in sync.

> - is it a similarity score for workflows
> implementations? A flexibility score for workflows? An indication for
> packages to be substituteable? Something to rank significance for all
> those packages left uncited in scientific publications? Once all the
> packages are in Debian, we can auto-create that matrix. But now? Any
> idea on the bio.tools front about this?

May be you put those in CC where you expect any answer from.  I
personally can not answer this.

Kind regards

      Andreas.

[1] 
https://salsa.debian.org/blends-team/med/-/blob/master/covid-19_doc/bio_covid-19_dependencies_query
[2] https://github.com/gatoravi/gzip_reader

-- 
http://fam-tille.de

Reply via email to