metadata

Matus Kalas Wed, 03 Mar 2021 08:40:06 -0800

Hey all again, and thanks for your thoughts Andrius and Andreas!


On 2021-03-03 09:36, Andreas Tille wrote:

Hi Andrius,

On 2021-03-03 08:54, Andrius Merkys wrote:

Dear Matus,

On 2021-03-02 19:56, Matus Kalas wrote:

I'd suggest hearing from the folks who have done the most of the work
with manually including those IDs, and letting them approve/decide.


Absolutely!


Steffen et al., your opninions on this matter?

I can imagine that for purely practical reasons in the process of the
manual curation, it might make sense to allow explicitly:
 - Name: OMICtools
Entry: N/A (Meaning: I have checked and there was norecord)
 - Name: bio.tools
Entry: "" (Meaning: I or someone else should check thisout;
or perhaps: I checked but wasn't conclusive yet)
The latter might be useful for contributors who aren't used to allthose
IDs, to make them more visible (including where the gaps are). But on
the other hand, if those are well present in an upstream/metadata
template and very clear in the documentation of upstream/metadata,thenit is not necessary and I'd then tend to like your suggestionAndrius.
To me, three flavors of "unknown" looks like an overkill. Most of the
metadata in Debian does not even have the two flavors of "unknown":
missing Bug-Submit field in d/u/metadata, Homepage in d/control and
Upstream-Contact in d/copyright means that this piece of informationis
either nonexistent or simply not entered (for example, due to the lack
of time). Thus I am not sure whether the added value is worth the
infrastructure/effort here. But again, this is solely my opinion,
certainly not aimed at reflecting those of the people who enter anduse
the data in d/u/metadata.
I wrote the UDD importer for the metadata files and thus look at the
data as a "consumer" of the provided information.  From this side those
different meanings of unknown are all turned into "ignore this value".
So in this respect differentiating between those unknowns is basically
helpful for those who edit the metadata files. Flagging something as"I
was here and have checked" is probably kind of helpful.  However, it
might perfectly be that some registry will include that specific
software later and re-checking makes sense.

For this reason I was recommending to not make those simple things to
complex since making it complex just drains time from the people whoare
working on it with no visible effect to the users.
If three flavors option would be preferred, I would also suggestaddingdate fields for each entry to signal at which point in time theregistry
was inspected.
As I wrote above later addition of some software to some registry can
spoil the different meanings of unknown.  This could be cured by such a
date field but I don't think it is of any better value than draining
time from people maintaining that extra field.  Thus I do not think we
should do this.

We definitely don't need a date, git blame does that. Also in the formof the Blame button in Salsa. Without a possibility for inconsistency.


Thanks a lot for your work on this

     Andreas.

--
http://fam-tille.de


Best,
Andrius

There is one closely related issue, which we just briefly touched uponwith Steffen and Hervé in a telcon: What to do with those "NA" packagesthat are missing in e.g. bio.tools?

The regitration in bio.tools (and surely also SciCrunch) could beautomated, but there are at least a couple of things needing humancuration:

- Which src packages represent one tool (often e.g. libs | languagebindings form separate Debian pkgs). How to mark this and where? Isthere an exisiting Debian mechanism? Or do we need to abuse thed/u/metadata "Entry" for that, before they're added? (3rd or 4th flavourof info then 😀 ; btw. git branches could help here 😉 ; and not in googlespreadsheet perhaps 😜 as it has to be machine-readable)

- Choosing an available, reasonable biotoolsID and tool name. Ideallytool name and biotoolsID are identical with ID having all small case andspaces removed/replaced.


  - Any other things needing human curation?



Thank you all, I'm very happy seeing this progressing!
Matus

P.S.: Could you please leave all the contents in when replying to thethread, so that others can reply to previously mentioned points withouthaving to read every single email in the thread and possibly breakinglinearity of it? I agree that's it not ecological to broadcast the sametext all around the globe again and again, but there are other solutionsthan emails that handle that without compromising. Many thanks!

Re: "Entry: NA" in debian/upstream/metadata

Reply via email to