Re: "Entry: NA" in debian/upstream/metadata

2021-03-03 Thread Andreas Tille
Hi Andrius,

On Wed, Mar 03, 2021 at 09:54:21AM +0200, Andrius Merkys wrote:
> > The latter might be useful for contributors who aren't used to all those
> > IDs, to make them more visible (including where the gaps are). But on
> > the other hand, if those are well present in an upstream/metadata
> > template and very clear in the documentation of upstream/metadata, then
> > it is not necessary and I'd then tend to like your suggestion Andrius.
> 
> To me, three flavors of "unknown" looks like an overkill. Most of the
> metadata in Debian does not even have the two flavors of "unknown":
> missing Bug-Submit field in d/u/metadata, Homepage in d/control and
> Upstream-Contact in d/copyright means that this piece of information is
> either nonexistent or simply not entered (for example, due to the lack
> of time). Thus I am not sure whether the added value is worth the
> infrastructure/effort here. But again, this is solely my opinion,
> certainly not aimed at reflecting those of the people who enter and use
> the data in d/u/metadata.

I wrote the UDD importer for the metadata files and thus look at the
data as a "consumer" of the provided information.  From this side those
different meanings of unknown are all turned into "ignore this value".
So in this respect differentiating between those unknowns is basically
helpful for those who edit the metadata files.  Flagging something as "I
was here and have checked" is probably kind of helpful.  However, it
might perfectly be that some registry will include that specific
software later and re-checking makes sense.

For this reason I was recommending to not make those simple things to
complex since making it complex just drains time from the people who are
working on it with no visible effect to the users.
 
> If three flavors option would be preferred, I would also suggest adding
> date fields for each entry to signal at which point in time the registry
> was inspected.

As I wrote above later addition of some software to some registry can
spoil the different meanings of unknown.  This could be cured by such a
date field but I don't think it is of any better value than draining
time from people maintaining that extra field.  Thus I do not think we
should do this.

Thanks a lot for your work on this

 Andreas.

-- 
http://fam-tille.de



2021 Europe* BioHackathon, applications due 2021-04-01

2021-03-03 Thread Michael R. Crusoe
The applications for the 2021 Europe* BioHackathon just opened and are due
on April 1st.

Perhaps packaging TensorFlow and the bioinformatic tools that need it would
be a good topic?

* Not limited to those in Europe

https://elixir-europe.org/BIoHackathon2021-project-submission

Given the special emphasis on industry partners, this would be a nice time
to sync up with the TensorFlow team at Google.

More details: https://biohackathon-europe.org/

I'm a member of ELIXIR Netherlands, so I would be happy to help anyone with
their application.

(I'm not subscribed to debian-ai@ , please keep me CC'd)

-- 
Michael R. Crusoe


MRtrix3

2021-03-03 Thread Julien Lamy

Hi all,
I've updated the Salsa repository for MRtrix3 so that the package now 
builds correctly. There are still a couple of issues :


Lintian complains about unnecessary-team-upload. Is this just the matter 
of Andreas' entry in the changelog?


All binaries are installed in /usr/lib/mrtrix3/bin. Is there a reason I 
missed or may I move them to /usr/bin?


Cheers,
--
Julien



Re: 2021 Europe* BioHackathon, applications due 2021-04-01

2021-03-03 Thread Andreas Tille
Hi Michael,

On Wed, Mar 03, 2021 at 10:27:51AM +0100, Michael R. Crusoe wrote:
> The applications for the 2021 Europe* BioHackathon just opened and are due
> on April 1st.
> 
> Perhaps packaging TensorFlow and the bioinformatic tools that need it would
> be a good topic?

I like that idea.  However, I'm not sure how much time I will be able to
spent myself.

> I'm a member of ELIXIR Netherlands, so I would be happy to help anyone with
> their application.

Does someone need to apply for the BioHackathon to join or can someone
simply sit down and contribute to the problems?

Kind regards

 Andreas.


-- 
http://fam-tille.de



Re: MRtrix3

2021-03-03 Thread Andreas Tille
Hi Julien,

On Wed, Mar 03, 2021 at 12:40:11PM +0100, Julien Lamy wrote:
> Hi all,
> I've updated the Salsa repository for MRtrix3 so that the package now builds
> correctly.

Thanks a lot.

> There are still a couple of issues :
> 
> Lintian complains about unnecessary-team-upload. Is this just the matter of
> Andreas' entry in the changelog?

Yep. Team upload is needed to silence lintian if someone not mentioned
as Uploader is owning the changelog entry.  I've just removed that.
 
> All binaries are installed in /usr/lib/mrtrix3/bin. Is there a reason I
> missed or may I move them to /usr/bin?

I have no idea.  Yaroslav?

Kind regards

  Andreas.

-- 
http://fam-tille.de



Re: MRtrix3

2021-03-03 Thread Nilesh Patra
Hi

On Wed, 3 Mar, 2021, 5:10 pm Julien Lamy,  wrote:

> Hi all,
> I've updated the Salsa repository for MRtrix3 so that the package now
> builds correctly. There are still a couple of issues :
>
> Lintian complains about unnecessary-team-upload.


Please add your work to the changelog and make yourself the changelog
holder for this revision. Do

$ gbp dch

and trim anything else that is needed

Is this just the matter
> of Andreas' entry in the changelog?
>

Yeah, people in Uploaders field aren't supposed to add "Team Upload"

All binaries are installed in /usr/lib/mrtrix3/bin.



They are being symlinked to usr/bin see here[1]

Is there a reason I
> missed or may I move them to /usr/bin?
>

I do not understand the reason for setting up symlinked here either.
In principle they can and should be installed in usr/bin
But probably only @Yaroslav (in CC) could answer that question as to why
they aren't and are symlinked instead.

I'd suggest that for "now" just leave it as is i.e. do not change any
install paths

[1]:
https://salsa.debian.org/med-team/mrtrix3/-/blob/master/debian/mrtrix3.links

Nilesh


Re: MRtrix3

2021-03-03 Thread Julien Lamy

Hi Nilesh,

Le 03/03/2021 à 13:31, Nilesh Patra a écrit :

Hi

On Wed, 3 Mar, 2021, 5:10 pm Julien Lamy, > wrote:

All binaries are installed in /usr/lib/mrtrix3/bin.



They are being symlinked to usr/bin see here[1]

Is there a reason I
missed or may I move them to /usr/bin?


I do not understand the reason for setting up symlinked here either.
In principle they can and should be installed in usr/bin
But probably only @Yaroslav (in CC) could answer that question as to why 
they aren't and are symlinked instead.


I'd suggest that for "now" just leave it as is i.e. do not change any 
install paths


mrtrix3.links is actually overwritten by the override_dh_link rule, 
leaving only mrview being symlinked. I'll fix that for now and wait for 
Yaroslav's input regarding the linking in general.




Re: MRtrix3

2021-03-03 Thread Julien Lamy

Le 03/03/2021 à 14:50, Julien Lamy a écrit :

Hi Nilesh,

Le 03/03/2021 à 13:31, Nilesh Patra a écrit :

Hi

On Wed, 3 Mar, 2021, 5:10 pm Julien Lamy, > wrote:

    All binaries are installed in /usr/lib/mrtrix3/bin.



They are being symlinked to usr/bin see here[1]

    Is there a reason I
    missed or may I move them to /usr/bin?


I do not understand the reason for setting up symlinked here either.
In principle they can and should be installed in usr/bin
But probably only @Yaroslav (in CC) could answer that question as to 
why they aren't and are symlinked instead.


I'd suggest that for "now" just leave it as is i.e. do not change any 
install paths


mrtrix3.links is actually overwritten by the override_dh_link rule, 
leaving only mrview being symlinked. I'll fix that for now and wait for 
Yaroslav's input regarding the linking in general.




The rationale is actually explained in README.Debian: some files names 
are too generic and present in other packages. I've re-ran the search 
for conflicts (for f in *; do apt-file search -x bin/$f$; done), and the 
only conflict as of today in Sid is usr/bin/dirsplit (also in genisoimage).


In MRtrix 3.0.2, most of the current bin files have rather specific 
names, although there are still things like "for_each" and "notfound".


Would the following solution be acceptable per Debian policy:
- Keep dirsplit and its dependency gen_scheme installed in 
usr/lib/mrtrix3/bin (current situation)
- Install everything else in usr/bin and resolve conflicts if/when they 
happen




Re: "Entry: NA" in debian/upstream/metadata

2021-03-03 Thread Matus Kalas

Hey all again, and thanks for your thoughts Andrius and Andreas!

On 2021-03-03 09:36, Andreas Tille wrote:

Hi Andrius,

On 2021-03-03 08:54, Andrius Merkys wrote:

Dear Matus,

On 2021-03-02 19:56, Matus Kalas wrote:

I'd suggest hearing from the folks who have done the most of the work
with manually including those IDs, and letting them approve/decide.


Absolutely!


Steffen et al., your opninions on this matter?




I can imagine that for purely practical reasons in the process of the
manual curation, it might make sense to allow explicitly:
 - Name: OMICtools
   Entry: N/A(Meaning: I have checked and there was no 
record)

 - Name: bio.tools
   Entry: "" (Meaning: I or someone else should check this 
out;

or perhaps: I checked but wasn't conclusive yet)

The latter might be useful for contributors who aren't used to all 
those

IDs, to make them more visible (including where the gaps are). But on
the other hand, if those are well present in an upstream/metadata
template and very clear in the documentation of upstream/metadata, 
then
it is not necessary and I'd then tend to like your suggestion 
Andrius.


To me, three flavors of "unknown" looks like an overkill. Most of the
metadata in Debian does not even have the two flavors of "unknown":
missing Bug-Submit field in d/u/metadata, Homepage in d/control and
Upstream-Contact in d/copyright means that this piece of information 
is

either nonexistent or simply not entered (for example, due to the lack
of time). Thus I am not sure whether the added value is worth the
infrastructure/effort here. But again, this is solely my opinion,
certainly not aimed at reflecting those of the people who enter and 
use

the data in d/u/metadata.


I wrote the UDD importer for the metadata files and thus look at the
data as a "consumer" of the provided information.  From this side those
different meanings of unknown are all turned into "ignore this value".
So in this respect differentiating between those unknowns is basically
helpful for those who edit the metadata files.  Flagging something as 
"I

was here and have checked" is probably kind of helpful.  However, it
might perfectly be that some registry will include that specific
software later and re-checking makes sense.

For this reason I was recommending to not make those simple things to
complex since making it complex just drains time from the people who 
are

working on it with no visible effect to the users.



If three flavors option would be preferred, I would also suggest 
adding
date fields for each entry to signal at which point in time the 
registry

was inspected.


As I wrote above later addition of some software to some registry can
spoil the different meanings of unknown.  This could be cured by such a
date field but I don't think it is of any better value than draining
time from people maintaining that extra field.  Thus I do not think we
should do this.


We definitely don't need a date, git blame does that. Also in the form 
of the Blame button in Salsa. Without a possibility for inconsistency.




Thanks a lot for your work on this

 Andreas.

--
http://fam-tille.de


Best,
Andrius


There is one closely related issue, which we just briefly touched upon 
with Steffen and Hervé in a telcon: What to do with those "NA" packages 
that are missing in e.g. bio.tools?


The regitration in bio.tools (and surely also SciCrunch) could be 
automated, but there are at least a couple of things needing human 
curation:


  - Which src packages represent one tool (often e.g. libs | language 
bindings form separate Debian pkgs). How to mark this and where? Is 
there an exisiting Debian mechanism? Or do we need to abuse the 
d/u/metadata "Entry" for that, before they're added? (3rd or 4th flavour 
of info then 😀 ; btw. git branches could help here 😉 ; and not in google 
spreadsheet perhaps 😜 as it has to be machine-readable)


  - Choosing an available, reasonable biotoolsID and tool name. Ideally 
tool name and biotoolsID are identical with ID having all small case and 
spaces removed/replaced.


  - Any other things needing human curation?



Thank you all, I'm very happy seeing this progressing!
Matus


P.S.: Could you please leave all the contents in when replying to the 
thread, so that others can reply to previously mentioned points without 
having to read every single email in the thread and possibly breaking 
linearity of it? I agree that's it not ecological to broadcast the same 
text all around the globe again and again, but there are other solutions 
than emails that handle that without compromising. Many thanks!




Re: "Entry: NA" in debian/upstream/metadata

2021-03-03 Thread Steffen Möller


Am 03.03.21 um 17:39 schrieb Matus Kalas:
> Hey all again, and thanks for your thoughts Andrius and Andreas!
>
> On 2021-03-03 09:36, Andreas Tille wrote:
>> Hi Andrius,
>>
>> On 2021-03-03 08:54, Andrius Merkys wrote:
>>> Dear Matus,
>>>
>>> On 2021-03-02 19:56, Matus Kalas wrote:
 I'd suggest hearing from the folks who have done the most of the work
 with manually including those IDs, and letting them approve/decide.
>>>
>>> Absolutely!
>
> Steffen et al., your opninions on this matter?

Sorry for being late on this.

So, "NA" indeed means like "hey, I checked but this was not found". This
information should not be lost.

An empty entry, as if from a template, does not have the same meaning.
If NA (which is how R expects it  and I found it likely to be easier to
parse) or N/A - I would not be bother to do all these changes and would
just leave it. Indeed, on the Excel sheet I am using N/A.

As it happens, we had a quick thought exchange on zoom today and I tend
to think that the general idea is that these NAs have to disappear, i.e.
add these entries to bio.tools.



>
>>>
 I can imagine that for purely practical reasons in the process of the
 manual curation, it might make sense to allow explicitly:
  - Name: OMICtools
    Entry: N/A    (Meaning: I have checked and there was no record)
  - Name: bio.tools
    Entry: "" (Meaning: I or someone else should check this
 out;
 or perhaps: I checked but wasn't conclusive yet)

 The latter might be useful for contributors who aren't used to all
 those
 IDs, to make them more visible (including where the gaps are). But on
 the other hand, if those are well present in an upstream/metadata
 template and very clear in the documentation of upstream/metadata,
 then
 it is not necessary and I'd then tend to like your suggestion Andrius.
>>>
>>> To me, three flavors of "unknown" looks like an overkill. Most of the
>>> metadata in Debian does not even have the two flavors of "unknown":
>>> missing Bug-Submit field in d/u/metadata, Homepage in d/control and
>>> Upstream-Contact in d/copyright means that this piece of information is
>>> either nonexistent or simply not entered (for example, due to the lack
>>> of time). Thus I am not sure whether the added value is worth the
>>> infrastructure/effort here. But again, this is solely my opinion,
>>> certainly not aimed at reflecting those of the people who enter and use
>>> the data in d/u/metadata.

Hm.  I see the following:

 * empty - nobody cared, yet
 * "N/A" or "NA" or "" or "" the latter two I would prefer but
do not really care, may be too difficult in YAML since < is a special
character - checked but not found
 * "" - bio.tools decided against referencing that package. We
are likely to see a few of these in near future.

>>
>> 
>>>
>>> If three flavors option would be preferred, I would also suggest adding
>>> date fields for each entry to signal at which point in time the
>>> registry
>>> was inspected.
>>
>> As I wrote above later addition of some software to some registry can
>> spoil the different meanings of unknown.  This could be cured by such a
>> date field but I don't think it is of any better value than draining
>> time from people maintaining that extra field.  Thus I do not think we
>> should do this.
>
> We definitely don't need a date, git blame does that. Also in the form
> of the Blame button in Salsa. Without a possibility for inconsistency.

This may be material for another paper: Means to synchronize between
volunteer databases.

 * Provenance is accepted
 * data transfer status - this is not yet happening in routine but this
is what we are doing here.

@Andrius - If I do not need to be involved and if no information is
lost, then I promise to be very happy with whatever you come up with,
whatever this may be. The chance to have a reference named "NA", though,
especially with all caps, that is darn close to zero and I wish you
would invest/sink your valuable time into something else.

Best,

Steffen


>> -- 
>> http://fam-tille.de
>>>
>>> Best,
>>> Andrius
>
> There is one closely related issue, which we just briefly touched upon
> with Steffen and Hervé in a telcon: What to do with those "NA"
> packages that are missing in e.g. bio.tools?
>
> The regitration in bio.tools (and surely also SciCrunch) could be
> automated, but there are at least a couple of things needing human
> curation:
>
>   - Which src packages represent one tool (often e.g. libs | language
> bindings form separate Debian pkgs). How to mark this and where? Is
> there an exisiting Debian mechanism? Or do we need to abuse the
> d/u/metadata "Entry" for that, before they're added? (3rd or 4th
> flavour of info then 😀 ; btw. git branches could help here 😉 ; and
> not in google spreadsheet perhaps 😜 as it has to be machine-readable)
>
>   - Choosing an available, reasonable biotoolsID and tool name.
> Ideally tool name and biotoolsID are identica

Re: 2021 Europe* BioHackathon, applications due 2021-04-01

2021-03-03 Thread Steffen Möller
I would like to revive my Debian package for the BOINC server side that
is prepared for AutoDock. This is just something I never got around to
since ... since 2015 ... gosh. There is now also a GPU variant of
AutoDock that needs some attention - many small bits that need to be
done to complete the workflow for end users. Anyone else on this list up
for it?

Am 03.03.21 um 10:27 schrieb Michael R. Crusoe:
> The applications for the 2021 Europe* BioHackathon just opened and are
> due on April 1st.
>
> Perhaps packaging TensorFlow and the bioinformatic tools that need it
> would be a good topic?
>
> * Not limited to those in Europe
>
> https://elixir-europe.org/BIoHackathon2021-project-submission
> 
>
> Given the special emphasis on industry partners, this would be a nice
> time to sync up with the TensorFlow team at Google.
>
> More details: https://biohackathon-europe.org/
> 
>
> I'm a member of ELIXIR Netherlands, so I would be happy to help anyone
> with their application.
>
> (I'm not subscribed to debian-ai@ , please keep me CC'd)
>
> --
> Michael R. Crusoe



Re: "Entry: NA" in debian/upstream/metadata

2021-03-03 Thread Andreas Tille
On Wed, Mar 03, 2021 at 06:58:12PM +0100, Steffen Möller wrote:
> So, "NA" indeed means like "hey, I checked but this was not found". This
> information should not be lost.
> 
> An empty entry, as if from a template, does not have the same meaning.
> If NA (which is how R expects it  and I found it likely to be easier to
> parse) or N/A - I would not be bother to do all these changes and would
> just leave it. Indeed, on the Excel sheet I am using N/A.

I definitely bother / veto agains renaming the now implemented 'NA' to
simply 'N/A' since there is no advantage at all.  However, I can follow
the argument of Andrius that and empty string "" (not just nothing as in
our empty boilerplate) could serve the given purpose.
 
>  * empty - nobody cared, yet

I'd prefer if this would be removed at all.

>  * "N/A" or "NA" or "" or "" the latter two I would prefer but
> do not really care,

As I said above:  I care.  Just stick to what we have and dont burn
developer time by renaming fun.

> @Andrius - If I do not need to be involved and if no information is
> lost, then I promise to be very happy with whatever you come up with,
> whatever this may be. The chance to have a reference named "NA", though,
> especially with all caps, that is darn close to zero and I wish you
> would invest/sink your valuable time into something else.

+1

Kind regards

  Andreas.

-- 
http://fam-tille.de



Re: MRtrix3

2021-03-03 Thread Andreas Tille
On Wed, Mar 03, 2021 at 04:45:02PM +0100, Julien Lamy wrote:
> 
> Would the following solution be acceptable per Debian policy:
> - Keep dirsplit and its dependency gen_scheme installed in
> usr/lib/mrtrix3/bin (current situation)
> - Install everything else in usr/bin and resolve conflicts if/when they
> happen

I do not see any actual advantage between symlinks in /usr/bin and
the binaries somewhere else or the binaries directly in /usr/bin.
I do not mind actually - just do whatever works.

Just a slight hint that there is a Blends-workaround for name
space conflicts which is described for instance here:

https://salsa.debian.org/med-team/plink/-/blob/master/debian/README.Debian

Kind regards

   Andreas.

-- 
http://fam-tille.de



Re: guix-based installation of pigx-rnaseq - works

2021-03-03 Thread Steffen Möller
Hi Simon,

Am 02.03.21 um 14:52 schrieb zimoun:
> Hi,
>
> On Sat, 27 Feb 2021 at 19:55, Steffen Möller  wrote:
>
 We keep mentioning conda also the time, while guix is somewhat left
 aside, but https://guix.gnu.org/packages/ is truly impressive. Somewhat
 annoying, if we ever decide to also reference guix packages in
 d/u/metadata, then there are only versioned web pages like
 https://guix.gnu.org/de/packages/bc-1.07.1/, so we would only have
 moving targets point to.
>>> What do you mean exactly?
>> In d/u/metadata we give references to the bio.tools registry,
>> SciCrunch's RRIDs and conda. But we should also give one to guix if we
>> know about an "ortholog" package. These references point up as URLs on
>> our task pages for the various blends. There is however no page for a
>> package in guix that is not versioned if I get this right.
> Just to understand, since I am not totally familiar with the debian/
> folder, you are speaking about this file, for instance,
>
> https://salsa.debian.org/med-team/pigx-rnaseq/-/blob/master/debian/upstream/metadata
yes
> What would be nice for Debian instead of
> ?
The updates of Debian's packages and those of guix are not synchronized.
And once something is in a regular Debian release, the Debian package
likely points to an older version of the same in guix. It would hence be
preferable to have an unversioned page to point to - which then lists
all the versions available for that package.
> And where could I find a an example on how the others do?

https://tracker.debian.org/pkg/vim

https://anaconda.org/conda-forge/vim

>> The fun part is, and this is where this shot on guix comes in, that we
>> can compare a Debian based implementation with an image that upstream
>> provides. Or compare with conda/brew/...guix. We are currently team
>> building and collect ideas what we want to achieve.
> I am not sure how the image upstream provides is built, 
> probably with
> "guix pack -f docker":
>
> 

Ah, there is is. Many thanks. I knew I had seen this somewhere :)

>> Hm. Not so sure. My interpretation was some general incompatibility of
>> above docker-approach with GUIX, so I happily adopt the "guix pack".
> There is another level to build Docker images using Guix: "guix system
> docker-image" but it is more complicated and probably not the correct
> path to distribute packages in a container.
>
> 

Hm. Would be interesting to learn about differences between different
ways to come up with an image.

Best,

Steffen




Re: guix-based installation of pigx-rnaseq - works

2021-03-03 Thread Pjotr Prins
On Tue, Feb 23, 2021 at 09:28:48PM +0100, Steffen Möller wrote:
> Hello,
> 
> sudo apt-get install guix
> guix install pigx-rnaseq

:)

> indeed installs the pigx-rnaseq library with its R dependencies. It
> installs it all in /gnu, which is somewhat inconvenient, as in "my root
> partition complained", and it takes painstakingly long, but it works. 
> 
> The system finds STAR and other executables in the regular system path,
> only pigx-rnaseq was installed:
> 
> $ guix package -l
> Generation 1    Feb 23 2021 18:27:42    (current)
>   pigx-rnaseq    0.0.10    out   
> /gnu/store/nlknrjmm2knbr8i5m5qj94788arfb14n-pigx-rnaseq-0.0.10
> 
> The full install "pigx" I have not tried, yet. Will do.
> 
> We keep mentioning conda also the time, while guix is somewhat left
> aside, but https://guix.gnu.org/packages/ is truly impressive. Somewhat
> annoying, if we ever decide to also reference guix packages in
> d/u/metadata, then there are only versioned web pages like
> https://guix.gnu.org/de/packages/bc-1.07.1/, so we would only have
> moving targets point to.
> 
> But otherwise, Debian's guix package has formidably done its job. It is
> about time we get the conda packages into a similar state, just maybe
> the / partition should be spared by default, but then again, it was only
> 6.5GB.
> 
> I had tried in vain to create a dockerfile with this setup which gets a
> permission error in the moment that the installation of the packages starts:
> 
> FROM debian:unstable
> ENV TERM=xterm
> RUN apt-get update -qq
> RUN apt-get install -y guix
> #RUN (/usr/bin/guix-daemon --build-users-group=_guixbuild & ) && guix
> install pigx-rnaseq
> #RUN (/usr/bin/guix-daemon & ) && guix install pigx-rnaseq # too big to fail
> RUN (/usr/bin/guix-daemon & ) && guix install vim # still takes long and
> fails

The guix-daemon needs full privileges by default, but it does not need
to run inside a container. Just build a docker or guix container and
you'll get a minimal dependency tree inside the container. I wrote
some stuff up here:

  https://github.com/pjotrp/guix-notes/blob/master/CONTAINERS.org

mostly because I don't trust my memory ;)

> Should this trigger any idea among those reading this - some RTFM plus a
> small pointer would be much appreciated.

Note that Guix is a rolling distribution. It can do that because
versions do not interfere with each other. You can have an unlimited
number of glibcs, pythons etc. 

The reason it installs software in /gnu/store is because of
reproducibility. All paths are hard coded in binaries and libs. There
is no search path to look up libs, for example. Install a package once
and you'll get identical paths between machines, containers etc. 

I love Debian, but Guix packages and containers are the cats whiskers!
It is brilliant to have them both. I have no need for conda.

Pj.



Outreachy project: complete workflows of tools

2021-03-03 Thread Tassia Camoes Araujo
Hi Debian-Med team!

-- this should be a reply to the sprint report, sorry I was not in the
list at the moment --

One of the topics we discussed was to have an additional Outreachy
intern to work on creating educational material, such as video
tutorials, to showcase workflows in the field using Debian-Med tools. We
brainstormed about qime, microbiome, but the idea would be to let the
student free to pick a workflow she/he has some experience with. 

Reading the list archive I saw a message by Tony Travis as a great
example of solving biological problems with our tools: "I use the
software you've packaged for my own work on the molecular genetics of
drought-tolerance and Nitrogen-use-efficiency in Rice and for studies of
the micro-virome of Tsetse files and Mosquitoes as well as for the
cancer genomics work on the cluster at the Mario Negri Institute in
Milan."

I volunteer to help to shape the proposal, and whatever else you need
(with my limited knowledge in the field) to make this come true.

Cheers,

Tassia.



Re: Outreachy project: complete workflows of tools

2021-03-03 Thread Tony Travis

On 03/03/2021 23:21, Tassia Camoes Araujo wrote:

[...]
Reading the list archive I saw a message by Tony Travis as a great
example of solving biological problems with our tools: "I use the
software you've packaged for my own work on the molecular genetics of
drought-tolerance and Nitrogen-use-efficiency in Rice and for studies of
the micro-virome of Tsetse files and Mosquitoes as well as for the
cancer genomics work on the cluster at the Mario Negri Institute in
Milan."


Hi, Tassia.

I've worked with the Debian-Med team for many years, starting with my 
contribution to the Bio-Linux project:



http://environmentalomics.org/bio-linux/


I'm now 'last man standing' on the Bio-Linux project, which is now 
entirely supported by the Debian-Med team. I install 'Bio-Linux' from a 
customised Ubuntu-MATE 20.04 LTS 'live' USB-stick with "med-bio" and 
"med-bio-dev" plus "x2goserver" and Bioconda.


One of the most important aspects of the Bio-Linux project was good 
documentation and tutorials about the software included. We used 
Bio-Linux 'live' instances booted from USB-sticks or VM's to teach 
bioinformatics tutorial courses and I continue to do this from time to 
time. In particular, I am currently employed part-time on a UK/India 
SANH project at the University of Aberdeen to teach GWAS and other 
methods and I'm helping researchers in India install "med-bio" from 
Debian-Med under Ubuntu-MATE 20.04 LTS on their computers for GWAS.


Thanks for your interest!

  Tony.

--
Minke Informatics Limited, Registered in Scotland - Company No. SC419028
Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK)
tel. +44(0)19755 63548http://minke-informatics.co.uk
mob. +44(0)7985 078324mailto:tony.tra...@minke-informatics.co.uk



Bug#984477: libbio-variation-perl: missing Breaks+Replaces: bioperl (<< 1.7.3)

2021-03-03 Thread Andreas Beckmann
Package: libbio-variation-perl
Version: 1.7.5-1
Severity: serious
User: debian...@lists.debian.org
Usertags: piuparts

Hi,

during a test with piuparts I noticed your package fails to upgrade from
'buster'.
It installed fine in 'buster', then the upgrade to 'bullseye' fails
because it tries to overwrite other packages files without declaring a
Breaks+Replaces relation.

See policy 7.6 at
https://www.debian.org/doc/debian-policy/ch-relationships.html#overwriting-files-and-replacing-packages-replaces

>From the attached log (scroll to the bottom...):

  Selecting previously unselected package libbio-variation-perl.
  Preparing to unpack .../048-libbio-variation-perl_1.7.5-1_all.deb ...
  Unpacking libbio-variation-perl (1.7.5-1) ...
  dpkg: error processing archive 
/tmp/apt-dpkg-install-eyjeT8/048-libbio-variation-perl_1.7.5-1_all.deb 
(--unpack):
   trying to overwrite '/usr/bin/bp_flanks', which is also in package bioperl 
1.7.2-3
  dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)

IIRC bioperl 1.7.3 is the version that got split up into several packages,
therefore the existing
  Breaks+Replaces: libbio-perl-perl (<= 1.7.2)
are also insufficiently versioned (should be (<< 1.7.3) as well).

cheers,

Andreas


bioperl-run_1.7.3-6.log.gz
Description: application/gzip


Re: "Entry: NA" in debian/upstream/metadata

2021-03-03 Thread Charles Plessy
Le Wed, Mar 03, 2021 at 08:08:08PM +0100, Andreas Tille a écrit :
> On Wed, Mar 03, 2021 at 06:58:12PM +0100, Steffen Möller wrote:
> > So, "NA" indeed means like "hey, I checked but this was not found". This
> > information should not be lost.
> > 
> > An empty entry, as if from a template, does not have the same meaning.
> > If NA (which is how R expects it  and I found it likely to be easier to
> > parse) or N/A - I would not be bother to do all these changes and would
> > just leave it. Indeed, on the Excel sheet I am using N/A.
> 
> I definitely bother / veto agains renaming the now implemented 'NA' to
> simply 'N/A' since there is no advantage at all.  However, I can follow
> the argument of Andrius that and empty string "" (not just nothing as in
> our empty boilerplate) could serve the given purpose.

Hi all,

how about YAML's null value for entries for which it was confirmed that
the information does not exist ?

Have a nice day,

Charles

-- 
Charles Plessy Nagahama, Yomitan, Okinawa, Japan
Debian Med packaging team http://www.debian.org/devel/debian-med
Tooting from work,   https://mastodon.technology/@charles_plessy
Tooting from home, https://framapiaf.org/@charles_plessy



Re: "Entry: NA" in debian/upstream/metadata

2021-03-03 Thread Andrius Merkys
Hi Andreas,

On 2021-03-03 10:36, Andreas Tille wrote:
> I wrote the UDD importer for the metadata files and thus look at the
> data as a "consumer" of the provided information.  From this side those
> different meanings of unknown are all turned into "ignore this value".
> So in this respect differentiating between those unknowns is basically
> helpful for those who edit the metadata files.  Flagging something as "I
> was here and have checked" is probably kind of helpful.  However, it
> might perfectly be that some registry will include that specific
> software later and re-checking makes sense.
> 
> For this reason I was recommending to not make those simple things to
> complex since making it complex just drains time from the people who are
> working on it with no visible effect to the users.
>  
>> If three flavors option would be preferred, I would also suggest adding
>> date fields for each entry to signal at which point in time the registry
>> was inspected.
> 
> As I wrote above later addition of some software to some registry can
> spoil the different meanings of unknown.  This could be cured by such a
> date field but I don't think it is of any better value than draining
> time from people maintaining that extra field.  Thus I do not think we
> should do this.

Thanks a lot for sharing your perspective. Personally, I am for keeping
the specification simple, but if certain conventions are helpful for
people maintaining the metadata, they surely should remain. However, all
such conventions should be described in DEP 12 to spare the confusion.

Best,
Andrius



Re: "Entry: NA" in debian/upstream/metadata

2021-03-03 Thread Andrius Merkys
On 2021-03-03 18:39, Matus Kalas wrote:
> Hey all again, and thanks for your thoughts Andrius and Andreas!
> 
> On 2021-03-03 09:36, Andreas Tille wrote:
>> Hi Andrius,
>>
>> On 2021-03-03 08:54, Andrius Merkys wrote:
>>> Dear Matus,
>>>
>>> On 2021-03-02 19:56, Matus Kalas wrote:
 I'd suggest hearing from the folks who have done the most of the work
 with manually including those IDs, and letting them approve/decide.
>>>
>>> Absolutely!
> 
> Steffen et al., your opninions on this matter?
> 
>>>
 I can imagine that for purely practical reasons in the process of the
 manual curation, it might make sense to allow explicitly:
  - Name: OMICtools
    Entry: N/A    (Meaning: I have checked and there was no record)
  - Name: bio.tools
    Entry: "" (Meaning: I or someone else should check this out;
 or perhaps: I checked but wasn't conclusive yet)

 The latter might be useful for contributors who aren't used to all
 those
 IDs, to make them more visible (including where the gaps are). But on
 the other hand, if those are well present in an upstream/metadata
 template and very clear in the documentation of upstream/metadata, then
 it is not necessary and I'd then tend to like your suggestion Andrius.
>>>
>>> To me, three flavors of "unknown" looks like an overkill. Most of the
>>> metadata in Debian does not even have the two flavors of "unknown":
>>> missing Bug-Submit field in d/u/metadata, Homepage in d/control and
>>> Upstream-Contact in d/copyright means that this piece of information is
>>> either nonexistent or simply not entered (for example, due to the lack
>>> of time). Thus I am not sure whether the added value is worth the
>>> infrastructure/effort here. But again, this is solely my opinion,
>>> certainly not aimed at reflecting those of the people who enter and use
>>> the data in d/u/metadata.
>>
>> I wrote the UDD importer for the metadata files and thus look at the
>> data as a "consumer" of the provided information.  From this side those
>> different meanings of unknown are all turned into "ignore this value".
>> So in this respect differentiating between those unknowns is basically
>> helpful for those who edit the metadata files.  Flagging something as "I
>> was here and have checked" is probably kind of helpful.  However, it
>> might perfectly be that some registry will include that specific
>> software later and re-checking makes sense.
>>
>> For this reason I was recommending to not make those simple things to
>> complex since making it complex just drains time from the people who are
>> working on it with no visible effect to the users.
>>
>>>
>>> If three flavors option would be preferred, I would also suggest adding
>>> date fields for each entry to signal at which point in time the registry
>>> was inspected.
>>
>> As I wrote above later addition of some software to some registry can
>> spoil the different meanings of unknown.  This could be cured by such a
>> date field but I don't think it is of any better value than draining
>> time from people maintaining that extra field.  Thus I do not think we
>> should do this.
> 
> We definitely don't need a date, git blame does that. Also in the form
> of the Blame button in Salsa. Without a possibility for inconsistency.

Agree.

>> Thanks a lot for your work on this
>>
>>  Andreas.
>>
>> -- 
>> http://fam-tille.de
>>>
>>> Best,
>>> Andrius
> 
> There is one closely related issue, which we just briefly touched upon
> with Steffen and Hervé in a telcon: What to do with those "NA" packages
> that are missing in e.g. bio.tools?
> 
> The regitration in bio.tools (and surely also SciCrunch) could be
> automated, but there are at least a couple of things needing human
> curation:
> 
>   - Which src packages represent one tool (often e.g. libs | language
> bindings form separate Debian pkgs). How to mark this and where? Is
> there an exisiting Debian mechanism? Or do we need to abuse the
> d/u/metadata "Entry" for that, before they're added? (3rd or 4th flavour
> of info then 😀 ; btw. git branches could help here 😉 ; and not in
> google spreadsheet perhaps 😜 as it has to be machine-readable)

Maybe a separate field could be introduced for that? I would prefer
leaving "Entry" for IDs only, so that an URL inside the registry could
be formulated in a straightforward manner. Imposing internal structure
on fields (i.e., abusing "Entry") introduces both negative effect on
machine-readability and possible namespace collisions. Should there be a
need for free-form storage for information, I would better introduce a
"Comment" field for each entry, where a maintainer could store anything
one believes is important about that entry.

>   - Choosing an available, reasonable biotoolsID and tool name. Ideally
> tool name and biotoolsID are identical with ID having all small case and
> spaces removed/replaced.
> 
>   - Any other things needing human curation?
> 
> 
> 
> Thank you all, I'

Re: "Entry: NA" in debian/upstream/metadata

2021-03-03 Thread Andrius Merkys
Hi Steffen,

On 2021-03-03 19:58, Steffen Möller wrote:
> 
> Am 03.03.21 um 17:39 schrieb Matus Kalas:
>> Hey all again, and thanks for your thoughts Andrius and Andreas!
>>
>> On 2021-03-03 09:36, Andreas Tille wrote:
>>> Hi Andrius,
>>>
>>> On 2021-03-03 08:54, Andrius Merkys wrote:
 Dear Matus,

 On 2021-03-02 19:56, Matus Kalas wrote:
> I'd suggest hearing from the folks who have done the most of the work
> with manually including those IDs, and letting them approve/decide.

 Absolutely!
>>
>> Steffen et al., your opninions on this matter?
> 
> Sorry for being late on this.
> 
> So, "NA" indeed means like "hey, I checked but this was not found". This
> information should not be lost.
> 
> An empty entry, as if from a template, does not have the same meaning.
> If NA (which is how R expects it  and I found it likely to be easier to
> parse) or N/A - I would not be bother to do all these changes and would
> just leave it. Indeed, on the Excel sheet I am using N/A.
> 
> As it happens, we had a quick thought exchange on zoom today and I tend
> to think that the general idea is that these NAs have to disappear, i.e.
> add these entries to bio.tools.

Thank you for confirming the distinction between empty value and "NA".

> I can imagine that for purely practical reasons in the process of the
> manual curation, it might make sense to allow explicitly:
>  - Name: OMICtools
>    Entry: N/A    (Meaning: I have checked and there was no record)
>  - Name: bio.tools
>    Entry: "" (Meaning: I or someone else should check this
> out;
> or perhaps: I checked but wasn't conclusive yet)
>
> The latter might be useful for contributors who aren't used to all
> those
> IDs, to make them more visible (including where the gaps are). But on
> the other hand, if those are well present in an upstream/metadata
> template and very clear in the documentation of upstream/metadata,
> then
> it is not necessary and I'd then tend to like your suggestion Andrius.

 To me, three flavors of "unknown" looks like an overkill. Most of the
 metadata in Debian does not even have the two flavors of "unknown":
 missing Bug-Submit field in d/u/metadata, Homepage in d/control and
 Upstream-Contact in d/copyright means that this piece of information is
 either nonexistent or simply not entered (for example, due to the lack
 of time). Thus I am not sure whether the added value is worth the
 infrastructure/effort here. But again, this is solely my opinion,
 certainly not aimed at reflecting those of the people who enter and use
 the data in d/u/metadata.
> 
> Hm.  I see the following:
> 
>  * empty - nobody cared, yet
>  * "N/A" or "NA" or "" or "" the latter two I would prefer but
> do not really care, may be too difficult in YAML since < is a special
> character - checked but not found
>  * "" - bio.tools decided against referencing that package. We
> are likely to see a few of these in near future.

Just a suggestion: maybe a "Status" field could be of use here? If more
special values of "Entry" are about to be introduced, it is better to
use a separate field to make this more machine-readable.

Suggested values for "Status":

* "confirmed" (default) - an entry in the registry is confirmed, and its
ID is stored in "Entry" field;
* "not-found" - the registry was checked for a match, but it was not
found at that point of time (here timestamp field could be of value);
* "rejected" - the registry explicitly rejected an attempt to register
the package;
* "pending" - package is submitted for registry, no response yet;
* ...

>>> 

 If three flavors option would be preferred, I would also suggest adding
 date fields for each entry to signal at which point in time the
 registry
 was inspected.
>>>
>>> As I wrote above later addition of some software to some registry can
>>> spoil the different meanings of unknown.  This could be cured by such a
>>> date field but I don't think it is of any better value than draining
>>> time from people maintaining that extra field.  Thus I do not think we
>>> should do this.
>>
>> We definitely don't need a date, git blame does that. Also in the form
>> of the Blame button in Salsa. Without a possibility for inconsistency.
> 
> This may be material for another paper: Means to synchronize between
> volunteer databases.
> 
>  * Provenance is accepted
>  * data transfer status - this is not yet happening in routine but this
> is what we are doing here.
> 
> @Andrius - If I do not need to be involved and if no information is
> lost, then I promise to be very happy with whatever you come up with,
> whatever this may be. The chance to have a reference named "NA", though,
> especially with all caps, that is darn close to zero and I wish you
> would invest/sink your valuable time into something else.

I do not want to interfere with the current practice nor cause l