Re: Google Summer of Code 2023 Inquiry

Kyle Thu, 30 Mar 2023 17:53:33 -0700

As a statistician who always wants to get the most information for the least 
effort, I am particularly interested in being able to reprioritize workflow 
jobs interactively within the equivalent portions of the topological sort. I 
thought perhaps this would be possible with GWL if it could talk to SLURM with 
DRMAA version 2 (https://en.wikipedia.org/wiki/DRMAA). This would also be more 
readily useful to researchers if Guix had a conveniently available slurm 
service which worked out of the box even on a single machine.

Stepping back, there might be a more ambitious question hidden in there in 
terms of how to handle indeterminism in a deterministic workflow manager. 
Without that external information the problem just involves choosing your 
random seeds up front. However,  I would prefer to write a procedure which is 
constantly reprioritizing labeled sub jobs within their associated containers 
either until I hit a resource limit or I have achieved certain target 
statistical diagnostics. Perhaps I would want GWL to tell me how to replay my 
build after the fact so I can make that reproducible even though I didn't know 
what I needed to focus my computations on up front and let the computer do 
that. Making that sort of thing possible might be a longer term effort, but 
working out what's needed for initial steps might be a fun project.

On March 30, 2023 7:27:37 PM EDT, Spencer Skylar Chan 
<scha...@terpmail.umd.edu> wrote:
>Hi Ricardo,
>
>On 3/23/23 03:58, Ricardo Wurmus wrote:
>> Hi,
>> 
>> Spencer Skylar Chan <scha...@terpmail.umd.edu> writes:
>> 
>>> One approach could be to add CWL import/export capabilities to
>>> GWL. Then Snakemake/GWL conversion would be a 2 step process, using
>>> CWL as an intermediate step:
>>> 
>>> 1. Snakemake -> CWL
>>> 2. CWL -> GWL
>> 
>> This seems doable.
>
>Great! I've been reading the chapter in Evolutionary Genomics on different 
>scalable workflows to understand this process better.
>
>>> However, CWL is not as expressive as Snakemake. There may be some
>>> details that are lost from Snakemake workflows.
>>> 
>>> So a 1-step Snakemake/GWL transpiler could be interesting, as both
>>> Snakemake/GWL use a domain-specific language inside a general purpose
>>> language (Python/Guile respectively). There may be a possibility to
>>> achieve more "accurate" translations between workflows.
>> 
>> Compared to the previous approach this seems vastly more complex.  It’s
>> one thing to *execute* Snakemake code without running it through Python,
>> but quite a bit more challenging to transpile Python to Scheme.
>> 
>> Personally, I wouldn’t know where to start.  Do you have an idea
>> already?
>> 
>
>Actually I was hoping you might have some ideas :)
>I do think that if the execution of the pipeline is more important than its 
>representation (Snakemake or otherwise), then it would make more sense to 
>focus efforts on increasing GWL's capabilities.
>
>Thanks,
>Skylar

Re: Google Summer of Code 2023 Inquiry

Reply via email to