guix and mirroring dataset

Cook, Malcolm Mon, 17 May 2021 12:35:10 -0700

HI,

Does the guix project and members suggest best guix-ish practices for managing 
on premise mirrors of large file-based data-sets such as appear in genomics HPC 
evironments?


Perhaps a guix-ish response to [Go Get Data \(GGD\) is a framework that 
facilitates reproducible access to genomic 
data](https://www.nature.com/articles/s41467-021-22381-z)

That would build on GWL?

Use cases would be, e.g. download/sync selected (versions of) genomes from 
Ensembl/NCBI etc and index them for Blast, blat, bowtie{2}, bwa, STAR, GMAP, 
HiSAT, IGV, BioConductor, etc...

I see much that addresses analysis workflows, such as
 -  [Reproducible genomics analysis pipelines with GNU 
Guix](https://www.biorxiv.org/content/10.1101/298653v2.full)
 - [Scalable Workflows and Reproducible Data Analysis for 
Genomics](https://pubmed.ncbi.nlm.nih.gov/31278683/)
 - [PiGx: reproducible genomics analysis pipelines with GNU 
Guix](https://academic.oup.com/gigascience/article/7/12/giy123/5114263)

Am I missing similar efforts toward maintaining an up-to-date catalog of the 
genomic resources that such workflows require?

Thanks!

Malcolm Cook
Database Applications Manager
Stowers Institute for Medical Research
Kansas City, MO  USA

guix and mirroring dataset

Reply via email to