Dear Martin Morgan,

Thanks for all those links!  Yes, my 
question can be characterized like that 
I think, traditional way writing 
a temporary table into the database and 
left JOINing the others vs. 
parameterized query. 

A relevant example would be to first 
create the database from the compressed 
tsv files:

        for i in gene_info gene2refseq; do
          wget -c https://ftp.ncbi.nlm.nih.gov/gene/DATA/$i.gz
          gzip -d $i.gz
          sqlite3 gene_info.sqlite ".mode tabs" ".import $i $i"
          rm $i; done

then run this R code:

        H <- data.frame(Group = c(1, 1, 2, 2),
          NM = c("NM_001267695", "NM_001007636",
                 "NM_001003706", "NM_001353612"))
        conn <- DBI::dbConnect(RSQLite::SQLite(), "gene_info.sqlite")
        DBI::dbWriteTable(conn, "H", H, overwrite=T)
        
        statement.1 <-
         "SELECT * FROM gene2refseq R
          LEFT JOIN gene_info I ON I.GeneID = R.GeneID
          WHERE R.`RNA_nucleotide_accession.version`
            LIKE '%' || ? || '%'"
        time.1 <- proc.time()
        x1 <- DBI::dbGetQuery(
          conn=conn,
          statement=statement.1,
          param=list(H$NM))
        time.1 <- proc.time() - time.1
        
        statement.2 <-
         "SELECT * FROM H
          LEFT JOIN gene2refseq R ON
            R.`RNA_nucleotide_accession.version`
            LIKE '%' || H.NM || '%'
          LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
        time.2 <- proc.time()
        x2 <- DBI::dbGetQuery(
          conn=conn,
          statement=statement.2)
        time.2 <- proc.time() - time.2
        
        DBI::dbDisconnect(conn)
        
        saveRDS(object=x1, file="ex1_x1.rds", compress="xz")
        saveRDS(object=x2, file="ex1_x2.rds", compress="xz")
        saveRDS(object=list("Time x1"=list(time.1),
                            "Time x2"=list(time.2)),
                file="ex1_t.rds", compress="xz")

I got these timings in the ex1_t.rds 
file:

        $`Time x1`
           user  system elapsed
        571.731 182.006 772.199
        
        $`Time x2`
           user  system elapsed
        200.068  90.529 295.086

As you can see, statement.1 takes a lot 
longer to process compared to 
statement.2 ...  When I add the rest of 
the 31 search terms, the difference gets 
a lot bigger like I pointed out 
initially, beyond the full hour vs. only 
a few minutes. 

Best,
Rasmus

Attachment: signature.asc
Description: PGP signature

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to