Package: wnpp Severity: wishlist Owner: Debian-med team <debian-...@lists.debian.org> X-Debbugs-Cc: debian-de...@lists.debian.org, debian-...@lists.debian.org
* Package name : genomicsdb Version : 1.4.3 Upstream Author : Intel Health and Lifesciences * URL : https://www.genomicsdb.org/ * License : Expat Programming Lang: C++, Java Description : sparse array storage library for genomics GenomicsDB is built on top of a htslib fork and an internal array storage system for importing, querying and transforming variant data. Variant data is sparse by nature (sparse relative to the whole genome) and using sparse array data stores is a perfect fit for storing such data. The GenomicsDB stores variant data in a 2D array where: - Each column corresponds to a genomic position (chromosome + position); - Each row corresponds to a sample in a VCF (or CallSet in the GA4GH terminology); - Each cell contains data for a given sample/CallSet at a given position; data is stored in the form of cell attributes; - Cells are stored in column major order - this makes accessing cells with the same column index (i.e. data for a given genomic position over all samples) fast. - Variant interval/gVCF interval data is stored in a cell at the start of the interval. The END is stored as a cell attribute. For variant intervals (such as deletions and gVCF REF blocks), an additional cell is stored at the END value of the variant interval. When queried for a given genomic position, the query library performs an efficient sweep to determine all intervals that intersect with the queried position. There is a C++ library and a Java library, we plan to ship both of them. This library is needed as a dependency of gatk, which is a packaging target of the Debian-med team.