Package: wnpp Severity: wishlist Owner: Andreas Tille <ti...@debian.org>
* Package name : metaphlan2 Version : 2.5.0 Upstream Author : Nicola Segata <nicola.seg...@unitn.it> * URL : https://bitbucket.org/nsegata/metaphlan2/wiki/Home * License : MIT Programming Lang: Python Description : Metagenomic Phylogenetic Analysis MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution. From version 2.0, MetaPhlAn is also able to identify specific strains (in the not-so-frequent cases in which the sample contains a previously sequenced strains) and to track strains across samples for all species. . MetaPhlAn 2.0 relies on ~1M unique clade-specific marker genes (the marker information file can be found at src/utils/markers_info.txt.bz2 or here) identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing: . * unambiguous taxonomic assignments; * accurate estimation of organismal relative abundance; * species-level resolution for bacteria, archaea, eukaryotes and viruses; * strain identification and tracking * orders of magnitude speedups compared to existing methods. * metagenomic strain-level population genomics Remark: The package is a target for Debian Med in itself and will be used by metaBIT. It will be maintained by the Debian Med team and the packaging is currently available at svn://anonscm.debian.org/debian-med/trunk/packages/metaphlan2/trunk/ ******* I'd like to discuss the following issue on debian-devel list ******* While Debian Med is injecting several low popularity contest packages this one has an extraordinary large set of data and thus I want to discuss the following options: 1) Original orig.tar.gz has 1GB and contains 1.2GB uncompressed binary data. License-wise it should not be a problem since there is a recipe given how to translate these into text form back and forth[1]. We would have: source package 1GB + binary package 1GB 2) When unpackaging the orig.tar.gz translating binary data to text format and recompress using xz the tarball is "only" 265MB. The transformation process takes about 30min on my Laptop - not longer than any larger project might need to build but the resulting binary package would have again close to 1GB. This enables the options: 2a) Source tarball 256MB + binary package 1GB 2b) Do the conversion of the format in postinst at the expense of users time which is acceptable since the package usually unpacks on high performance machines and not so many installations which means bandwidth and disk space on Debian mirrors should be saved here instead of users machine Source tarball 256MB + binary package ~250MB (estimated) 3) Strip all data from the source package and download data in postinst from upstream Git repository. This makes the package of uncritical size from a Debian point of view but might be problematic in some user setups which might have problems with larger data downloads (possibly be upstream can be convinced to provide a *.bz2 tarball for maximum compression). 3a) Use postinst 3b) Inform user to call a download script manually to do not block apt for a longer time dealing with potential download problems. What do you think what strategy should be choosen to be kind to Debian (and mirror) resources? Kind regards Andreas. [1] https://bitbucket.org/biobakery/metaphlan2#markdown-header-customizing-the-database