Re: Help with autopkgtest for Seqsero and python-loompy

2021-03-07 Thread Shruti Sridhar
Hi,

Thank you so much! I haven't had a chance to work on loompy yet, but I have
added tests to Seqsero. Please review and sponsor.

Best,
Shruti



On Wed, Feb 24, 2021 at 2:33 PM Andreas Tille  wrote:

> On Wed, Feb 24, 2021 at 02:00:19PM +0530, Nilesh Patra wrote:
> > Protecting the master branch is an ideal thing to do since this doesn't
> > allow force pushing -- if it wasn't protected, it'd be easy for a new
> comer
> > who's not very well versed
> > with git to mistakenly mess things up.
> > ...
> > That can have repercussions (see the reply above) -- I'd recommend not
> > doing that
> > ...
> > Probably the rationale is that someone set as "maintainer" can change
> > protected branch settings + CI vars et. al -- this can end up causing
> > potential damage if accidentally a spammer gets such an access.
> > But I suppose we assume good intent most of the times, and it should
> > generally be safe. (It's quite OK in Shruti's case, definitely)
>
> So I think pushing to master branch should be permitted for every
> "Developer" and it would be great if we could force this setting
> easily on all repositories.
>
> Kind regards
>
>  Andreas.
>
> --
> http://fam-tille.de
>
>


Re: Help with autopkgtest for Seqsero and python-loompy

2021-03-07 Thread Nilesh Patra
On Sun, 7 Mar, 2021, 11:58 pm Shruti Sridhar, 
wrote:

> Hi,
>
> Thank you so much! I haven't had a chance to work on loompy yet, but I
> have added tests to Seqsero. Please review and sponsor.
>

Please remove the "ubuntu1" from changelog header.

@Andreas, I'm not sure if it should be uploaded at this point since we're
hitting freeze in 4 days. Thoughts?

Nilesh

>


Re: Help with autopkgtest for Seqsero and python-loompy

2021-03-07 Thread Andreas Tille
On Mon, Mar 08, 2021 at 12:30:52AM +0530, Nilesh Patra wrote:
> > Thank you so much! I haven't had a chance to work on loompy yet, but I
> > have added tests to Seqsero. Please review and sponsor.
> 
> Please remove the "ubuntu1" from changelog header.

+1
 
> @Andreas, I'm not sure if it should be uploaded at this point since we're
> hitting freeze in 4 days. Thoughts?

I think its better not to upload to unstable.  Either uploading to
experimental or delay the upload until the next release cycle will
start again.

Kind regards

   Andreas.

-- 
http://fam-tille.de



Re: Outreachy project: complete workflows of tools

2021-03-07 Thread Tassia Camoes Araujo
Hi all,

On 2021-03-05 11:09, Tassia Camoes Araujo wrote:
> [...]
> Since deadline is approaching quickly, I suggest we move this off-list
> and start writing in a wiki or pad.

Here is my first draft, pasted below for your reference, but you can
edit the text directly in this pad:
https://storm.debian.net/shared/OCYsOOEqJ5-CcfjVxIiye-ex-4LheegyP1pkIFghMKa

Help needed to write the Intern tasks section. Also, please check if you
have better scientific articles to recommend and list a few tools to be
used in the starter tasks.

Please help shaping this proposal so it ***really*** makes sense and
attract interns ;-)

Cheers,

Tassia.

--
Project title
Validation of Debian Med tools for complete bioinformatics workflows 

Description
Debian Med is a "Debian Pure Blend" which aims to develop Debian into an
operating system particularly well fit for medical practice and
biomedical
research. Data analysis in this field is typically implemented as a
workflow or
pipeline, with multiple tools executed as a chain, each processing input
and
producing output for the next tool in the chain.

This internship will focus on validating workflows that can be fully
executed
within Debian. Deliverables will be educational materials, such as video
or
written tutorials, showcasing how the various tools can be chained
together
in order to solve a particular biological problem.

An example of a workflow would be an RNA-seq workflow that executes
Trimmomatic, FastQC, salmon, and the R script using a single command
(extracted from [1]):

- FastQC, a program that checks NGS reads for common quality issues
- Trimmomatic, a program for cleaning NGS reads
- salmon, a program for estimating transcript abundance from NGS reads
- custom R script that uses DESeq2 to perform differential expression
analysis

The intern will be free to choose tools/workflows of interest, and
guidance
will be given in the choice of a relevant one for the research
community. An
useful reading to start, particularly if you are not in the field of
bioinformatics, is an article on open source tools and toolkits for
bioinformatics [2]. Typical workflows are described in numerous
scientific
peer reviewed works, such as to decipher transcriptomic data from
vitamin D
studies [3] and for the evaluation of RT-qPCR primer specificity [4].

(*** check if you have better articles to recommend ***)

[1]
https://bioinformatics.stackexchange.com/questions/7347/what-is-the-difference-between-a-bioinformatics-pipeline-and-workflow
[2]
https://www.researchgate.net/publication/6888681_Open_source_tools_and_toolkits_for_bioinformatics_Significance_and_where_are_we
[3]
https://www.sciencedirect.com/science/article/pii/S0960076018306034#bib0030
[4] https://pubmed.ncbi.nlm.nih.gov/31945455/


How can applicants make a contribution (starter tasks)
(*** More ideas? Suggested: 10-20 small and 5-10 medium-sized tasks ***)

1. Create a short written tutorial using a particular bioinformatic tool
(*** please suggest a few tools ***)

2. Translate the written tutorial in a video piece

3. Identify bioinformatics workflows of your own interest (or of a
researcher you know)

4. Select and read a peer-reviewed article describing an workflow and
extract tools used

5. Classify tools from a particular workflow as: FLOSS in Debian, FLOSS
not yet in Debian, proprietary (any known alternative?)

6. Gather sample data to be used in demonstrations of workflows of
interest


Applicant skills (description - impact on selection - experience Level)

- Writing skills - Required - Experimented
- Video editing skills - Preferred - Concepts
- Debian system knowledge - Preferred - Concepts
- Use of bioinformatics tools - Preferred - Concepts


Intern tasks

(*** HELP NEEDED ***)



Re: What date format for d/u/metadata ? Re: "Entry: NA" in debian/upstream/metadata

2021-03-07 Thread Andrius Merkys
Hi Steffen,

On 2021-03-05 21:19, Steffen Möller wrote:
> Am 05.03.21 um 16:13 schrieb Andrius Merkys:
>> On 2021-03-04 23:12, Steffen Möller wrote:
>>> Somewhere else was the suggestion made to also add a time stamp. This
>>> makes perfect sense for the NA/~ and in that case, if that date was
>>> specified, we know that unknown is a confirmed unknown. For entries that
>>> are found, we should possibly just rely on git blame in salsa.
>> Exactly. This was my point. Because if someone stumbles upon a timestamp
>> from 3+ years ago, one may check the registry to see if the entry is
>> still not there. If the entry is still missing, one would update the
>> timestamp to let everyone else know "hey, I have checked it, and it is
>> not there". Otherwise one's effort will be lost, and the next one who
>> sees a missing entry may repeatedly drain one's time looking.
> 
> Since I was just active on pigx-rnaseq for the thread on guix, I came up
> with
> 
> Registry:
>  - Name: OMICtools
>    Entry: OMICS_33677
>  - Name: conda:bioconda
>    Entry: NA
>    Checked: Fri, 05 Mar 2021 20:06:08 +0100
>  - Name: guix
>    Entry: pigx-rnaseq
>  - Name: bio.tools
>    Entry: NA
>    Checked: Fri, 05 Mar 2021 20:07:04 +0100
> 
> But, donno, this RFC 5322 is barely parseable by eye, even though this
> is how we typically put dates in Debian (you get this via 'date -R').
> Much more readable though would be `date --rfc-3339=date`

I would also vote for RFC 3339. RFC 5322 admittedly removes some
ambiguity (as confusing -MM-DD for -DD-MM), but is not so easy
to read/write. RFC 3339 is also widely used in Debian, for example, for
appending timestamps to source package versions and package diff files [1].

> Registry:
>  - Name: OMICtools
>    Entry: OMICS_33677
>  - Name: conda:bioconda
>    Entry: NA
>    Checked: 2021-03-05
>  - Name: guix
>    Entry: pigx-rnaseq
>  - Name: bio.tools
>    Entry: NA
>    Checked: 2021-03-05
> 
> but do our American friends understand that this is not May? And we do
> not need the time, as in
> 
> 2021-03-05 20:14:12+01:00
> 
> I would start without the time and then add it if needed - but as I
> said, the art is to eliminate the NAs in the respective
> registry/repository and for that, the time of the day does not really
> matter, I tend to think.

Dates without time have a total of 48 hours of uncertainty due to time
zones (if my calculations are correct). Most likely this uncertainty
could be ignored for this particular application.

> A pending question is if we need a "" as in "This entry is not
> going to be added to that repository". I personally do think so and
> consider this information more important than the NA since a repeated
> request likely annoys someone on the other end.

Some messages ago [2] I have suggested introducing "Status" field for
indicating special states of entries, such as not found, rejected,
pending and like. Such field would completely remove the need to place
non-ID information in "Entry" field. What do you think about it?

[1] http://ftp.us.debian.org/debian/dists/unstable/main/source/Sources.diff/
[2] https://lists.debian.org/debian-med/2021/03/msg00035.html

Best,
Andrius