Changeset: 8e51c8fdebca for MonetDB
URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=8e51c8fdebca
Added Files:
        sql/backends/monet5/bam/Tests/benchmark1.sql
        sql/backends/monet5/bam/Tests/benchmark2.sql
Removed Files:
        sql/backends/monet5/bam/Tests/benchmark.sql
Modified Files:
        sql/backends/monet5/bam/85_bam.sql
        sql/backends/monet5/bam/bam.mal
        sql/backends/monet5/bam/bam_lib.c
        sql/backends/monet5/bam/bam_lib.h
Branch: DVframework_bam
Log Message:

Finished benchmark queries. Result: Tests/benchmark1.sql and 
Tests/benchmark2.sql. The first one contains some simple queries, the second 
one more complicated ones. Adjusted the bam_lib a little to work nicely 
together with the benchmark queries.


diffs (truncated from 954 to 300 lines):

diff --git a/sql/backends/monet5/bam/85_bam.sql 
b/sql/backends/monet5/bam/85_bam.sql
--- a/sql/backends/monet5/bam/85_bam.sql
+++ b/sql/backends/monet5/bam/85_bam.sql
@@ -8,5 +8,8 @@ RETURNS BOOLEAN EXTERNAL NAME bam.bam_fl
 CREATE FUNCTION reverse_seq(seq STRING)
 RETURNS STRING EXTERNAL NAME bam.reverse_seq;
 
+CREATE FUNCTION reverse_qual(qual STRING)
+RETURNS STRING EXTERNAL NAME bam.reverse_qual;
+
 CREATE FUNCTION seq_length(cigar STRING)
-RETURNS INT EXTERNAL NAME bam.seq_length;
\ No newline at end of file
+RETURNS INT EXTERNAL NAME bam.seq_length;
diff --git a/sql/backends/monet5/bam/Tests/benchmark.sql 
b/sql/backends/monet5/bam/Tests/benchmark.sql
deleted file mode 100644
--- a/sql/backends/monet5/bam/Tests/benchmark.sql
+++ /dev/null
@@ -1,306 +0,0 @@
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 1 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT qname,
-    CASE WHEN bam_flag(flag, 'segm_reve') THEN reverse_seq(seq)   ELSE seq  
END,
-    CASE WHEN bam_flag(flag, 'segm_reve') THEN reverse(qual)      ELSE qual END
-FROM bam.alignments
-WHERE   file_id = 1
-    AND bam_flag(flag, 'seco_alig') = FALSE
-    AND bam_flag(flag, 'segm_unma') = TRUE
-    AND qname IN (
-        SELECT qname
-        FROM bam.alignments
-        WHERE bam_flag(flag, 'seco_alig') = FALSE
-        GROUP BY qname
-        HAVING COUNT(*) = 2
-           AND SUM(bam_flag(flag, 'firs_segm')) = 1
-           AND SUM(bam_flag(flag, 'last_segm')) = 1
-    )
-ORDER BY qname, bam_flag(flag, 'last_segm');
-
--- Description:
--- This query selects fields required by the FASTQ file format (qname, 
seq/seq-reverse, qual/qual-reverse).
--- The result of this query has the following properties:
--- * It is calculated over a single BAM file
--- * It only considers primary alignments
--- * It only considers unmapped alignments
--- * The subquery selects only those qnames that have exactly two primary 
alignments: one left and one right alignment
--- * It is ordered by qname and then by the last_segm flag. The subquery 
imposes the invariant on the outer query that 
---   only qnames are selected that have exactly one left primary read and one 
right primary read. The result of this
---   order is then that alignments of the same read are stored beneath 
eachother; the left segment first, and then the
---   right segment.
-
-
-
-
-
-
-
-/*
-
-
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 2 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT a2.pos - (a1.pos + seq_length(a1.cigar)) AS distance, COUNT(*) AS 
nr_alignments
-FROM (
-    SELECT *
-    FROM bam.alignments
-    WHERE file_id = 1
-      AND bam_flag(flag, 'seco_alig') = False
-      AND bam_flag(flag, 'firs_segm') = True
-) AS a1 JOIN (
-    SELECT *
-    FROM bam.alignments
-    WHERE file_id = 1
-      AND bam_flag(flag, 'seco_alig') = False
-      AND bam_flag(flag, 'last_segm') = True
-) AS a2 ON a1.qname = a2.qname
-GROUP BY distance;
-
--- Description:
--- This query calculates a histogram that displays the number of read pairs 
with a certain distance.
--- The outer query joins two subresults together. These subresults contain all 
primary alignments for
--- the first segment and the last segment respectively. If the query is 
consistent according to the 
--- consistency check in query 4, there is a unique primary alignment for both 
sides (firs_segm and last_segm).
--- In the case there is a primary alignment in just one of the two subresults, 
the join will drop this
--- since the join predicate won't be fulfilled. 
--- The distance of each record in this joined table can now be calculated. 
Furthermore, grouping can be 
--- done on this distance to create the histogram.
-
-
-
-
-
-
-
-
-
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 3 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT *
-FROM bam.alignments 
-WHERE qname IN (
-    SELECT qname
-    FROM bam.alignments
-    WHERE file_id = 1
-    GROUP BY qname
-    HAVING SUM(bam_flag(flag, 'firs_segm')) = 0 
-        OR SUM(bam_flag(flag, 'last_segm')) = 0
-)
-ORDER BY qname;
-
--- Description:
--- This query checks a consistency aspect of the BAM file with file_id=1. It 
returns all alignments
--- for qnames that have either no alignment flagged as first segment or no 
alignment flagged as last segment.
--- The inner query groups by qname and selects the inconsistent ones. The 
outer query then selects all attributes
--- from all alignments for these qnames.
--- To provide the user with a convenient overview of the inconsistencies, the 
result is ordered by qname.
-
-
-
-
-
-
-
-
-
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 4 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT *
-FROM bam.alignments
-WHERE qname IN (
-    SELECT qname, 
-        SUM(bam_flag(flag, 'firs_segm')) AS sum_firs_segm, 
-        SUM(bam_flag(flag, 'last_segm')) AS sum_last_segm, 
-        COUNT(*) - SUM(bam_flag(flag, 'seco_alig')) AS sum_prim_alig
-    FROM bam.alignments
-    WHERE file_id=1
-      AND bam_flag(flag, 'segm_unma') = False
-    GROUP BY qname
-    HAVING (sum_firs_segm > 0 AND sum_last_segm > 0 AND sum_prim_alig <> 2)
-        OR (sum_firs_segm > 0 AND sum_prim_alig <> 1)
-        OR (sum_last_segm > 0 AND sum_prim_alig <> 1)
-)
-ORDER BY qname;
-
--- Description:
--- This query also checks a consistency aspect of the BAM file with file_id=1. 
Every qname consists of two reads.
--- For both reads it must hold that all its alignments must either be unmapped 
or there must be exactly one primary
--- alignment.
--- In the inner query, the alignments with the unmapped flag set are thrown 
away directly. The alignments that 
--- then remain are grouped by qname. A qname is then considered to be 
inconsistent if it fulfills one of these cases:
--- * There exists a left alignment and a right alignment in the group and the 
number of primary alignments <> 2
--- * There exists a left alignment and no right alignment in the group and the 
number of primary alignments <> 1
--- * There exists no left alignment and a right alignment in the group and the 
number of primary alignments <> 1
--- Exactly these three cases can be seen in the HAVING clause of the inner 
query.
--- The query assumes that there won't exist alignments with flags firs_segm = 
last_segm = 0.
--- The outer query again selects all attributes of all alignments in the 
inconsistent qnames. 
--- The result is again ordered by qname for convenient displaying.
-
-
-
-
-
-
-
-
-
-
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 5 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT *
-FROM alignments
-WHERE qname IN (
-    SELECT qname
-    FROM bam.alignments
-    WHERE file_id = 1
-    INTERSECT
-    SELECT qname 
-    FROM bam.alignments
-    WHERE file_id = 2
-);
-
--- Description:
--- Does a set intersection on BAM files with file_id=1 and file_id=2, based on 
qname.
-
-
-
-
-
-
-
-
-
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 6 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT *
-FROM alignments
-WHERE qname IN (
-    SELECT qname
-    FROM bam.alignments
-    WHERE file_id = 1
-    EXCEPT
-    SELECT qname 
-    FROM bam.alignments
-    WHERE file_id = 2
-);
-
--- Description:
--- Does a set minus on BAM files with file_id=1 and file_id=2, based on qname.
-
-
-
-
-
-
-
-
-
-
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 7 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT *
-FROM (
-    SELECT *
-    FROM
-    bam.alignments
-    WHERE file_id = 1
-      AND bam_flag(flag, 'seco_alig') = FALSE
-) AS a1 JOIN (
-    SELECT *
-    FROM
-    bam.alignments
-    WHERE file_id = 2
-      AND bam_flag(flag, 'seco_alig') = FALSE
-) AS a2 
-    ON  a1.qname = a2.qname
-    AND bam_flag(a1.flag, 'firs_segm') = bam_flag(a2.flag, 'firs_segm')
-    AND bam_flag(a1.flag, 'last_segm') = bam_flag(a2.flag, 'last_segm')
-    AND a1.pos <> a2.pos
-    
--- Description:
--- Joins primary alignments from two files (file_id=1 and file_id=2) together 
if these alignments have the same qname but
--- are mapped to different positions. 
--- The first subquery selects all primary alignments from file with file_id=1
--- The second subquery selects all primary alignments from file with file_id=2
--- The outer query then joins two alignments from these two subresults 
together under the following conditions:
--- * The qnames of the alignments are equal
--- * The alignments are both either left or right alignments
--- * The positions of the alignments aren't equal
-
-
-
-
-
-
-
-
-
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 8 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT *
-FROM alignments
-WHERE   file_id = 1
-    AND rname = 'chr22'
-    AND 10 >= pos
-    AND 10 < pos + seq_length(cigar);
-
--- Description:
--- Selects all alignments with file_id=1 that overlap position 10 in 
chromosome "chr22"
-
-
-
-
-
-
-
-
-
---------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------- Query 9 
----------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------
-
-SELECT 
-FROM (
-    SELECT *
_______________________________________________
checkin-list mailing list
checkin-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/checkin-list

Reply via email to