Am 03.03.21 um 17:39 schrieb Matus Kalas: > Hey all again, and thanks for your thoughts Andrius and Andreas! > > On 2021-03-03 09:36, Andreas Tille wrote: >> Hi Andrius, >> >> On 2021-03-03 08:54, Andrius Merkys wrote: >>> Dear Matus, >>> >>> On 2021-03-02 19:56, Matus Kalas wrote: >>>> I'd suggest hearing from the folks who have done the most of the work >>>> with manually including those IDs, and letting them approve/decide. >>> >>> Absolutely! > > Steffen et al., your opninions on this matter?
Sorry for being late on this. So, "NA" indeed means like "hey, I checked but this was not found". This information should not be lost. An empty entry, as if from a template, does not have the same meaning. If NA (which is how R expects it and I found it likely to be easier to parse) or N/A - I would not be bother to do all these changes and would just leave it. Indeed, on the Excel sheet I am using N/A. As it happens, we had a quick thought exchange on zoom today and I tend to think that the general idea is that these NAs have to disappear, i.e. add these entries to bio.tools. > >>> >>>> I can imagine that for purely practical reasons in the process of the >>>> manual curation, it might make sense to allow explicitly: >>>> - Name: OMICtools >>>> Entry: N/A (Meaning: I have checked and there was no record) >>>> - Name: bio.tools >>>> Entry: "" (Meaning: I or someone else should check this >>>> out; >>>> or perhaps: I checked but wasn't conclusive yet) >>>> >>>> The latter might be useful for contributors who aren't used to all >>>> those >>>> IDs, to make them more visible (including where the gaps are). But on >>>> the other hand, if those are well present in an upstream/metadata >>>> template and very clear in the documentation of upstream/metadata, >>>> then >>>> it is not necessary and I'd then tend to like your suggestion Andrius. >>> >>> To me, three flavors of "unknown" looks like an overkill. Most of the >>> metadata in Debian does not even have the two flavors of "unknown": >>> missing Bug-Submit field in d/u/metadata, Homepage in d/control and >>> Upstream-Contact in d/copyright means that this piece of information is >>> either nonexistent or simply not entered (for example, due to the lack >>> of time). Thus I am not sure whether the added value is worth the >>> infrastructure/effort here. But again, this is solely my opinion, >>> certainly not aimed at reflecting those of the people who enter and use >>> the data in d/u/metadata. Hm. I see the following: * empty - nobody cared, yet * "N/A" or "NA" or "<N/A>" or "<NA>" the latter two I would prefer but do not really care, may be too difficult in YAML since < is a special character - checked but not found * "<rejected>" - bio.tools decided against referencing that package. We are likely to see a few of these in near future. >> >> <all easy for Andreas> >>> >>> If three flavors option would be preferred, I would also suggest adding >>> date fields for each entry to signal at which point in time the >>> registry >>> was inspected. >> >> As I wrote above later addition of some software to some registry can >> spoil the different meanings of unknown. This could be cured by such a >> date field but I don't think it is of any better value than draining >> time from people maintaining that extra field. Thus I do not think we >> should do this. > > We definitely don't need a date, git blame does that. Also in the form > of the Blame button in Salsa. Without a possibility for inconsistency. This may be material for another paper: Means to synchronize between volunteer databases. * Provenance is accepted * data transfer status - this is not yet happening in routine but this is what we are doing here. @Andrius - If I do not need to be involved and if no information is lost, then I promise to be very happy with whatever you come up with, whatever this may be. The chance to have a reference named "NA", though, especially with all caps, that is darn close to zero and I wish you would invest/sink your valuable time into something else. Best, Steffen >> -- >> http://fam-tille.de >>> >>> Best, >>> Andrius > > There is one closely related issue, which we just briefly touched upon > with Steffen and Hervé in a telcon: What to do with those "NA" > packages that are missing in e.g. bio.tools? > > The regitration in bio.tools (and surely also SciCrunch) could be > automated, but there are at least a couple of things needing human > curation: > > - Which src packages represent one tool (often e.g. libs | language > bindings form separate Debian pkgs). How to mark this and where? Is > there an exisiting Debian mechanism? Or do we need to abuse the > d/u/metadata "Entry" for that, before they're added? (3rd or 4th > flavour of info then 😀 ; btw. git branches could help here 😉 ; and > not in google spreadsheet perhaps 😜 as it has to be machine-readable) > > - Choosing an available, reasonable biotoolsID and tool name. > Ideally tool name and biotoolsID are identical with ID having all > small case and spaces removed/replaced. > > - Any other things needing human curation? > > > > Thank you all, I'm very happy seeing this progressing! > Matus > > > P.S.: Could you please leave all the contents in when replying to the > thread, so that others can reply to previously mentioned points > without having to read every single email in the thread and possibly > breaking linearity of it? I agree that's it not ecological to > broadcast the same text all around the globe again and again, but > there are other solutions than emails that handle that without > compromising. Many thanks! >