Dear Martin Morgan, Thanks for all those links! Yes, my question can be characterized like that I think, traditional way writing a temporary table into the database and left JOINing the others vs. parameterized query.
A relevant example would be to first create the database from the compressed tsv files: for i in gene_info gene2refseq; do wget -c https://ftp.ncbi.nlm.nih.gov/gene/DATA/$i.gz gzip -d $i.gz sqlite3 gene_info.sqlite ".mode tabs" ".import $i $i" rm $i; done then run this R code: H <- data.frame(Group = c(1, 1, 2, 2), NM = c("NM_001267695", "NM_001007636", "NM_001003706", "NM_001353612")) conn <- DBI::dbConnect(RSQLite::SQLite(), "gene_info.sqlite") DBI::dbWriteTable(conn, "H", H, overwrite=T) statement.1 <- "SELECT * FROM gene2refseq R LEFT JOIN gene_info I ON I.GeneID = R.GeneID WHERE R.`RNA_nucleotide_accession.version` LIKE '%' || ? || '%'" time.1 <- proc.time() x1 <- DBI::dbGetQuery( conn=conn, statement=statement.1, param=list(H$NM)) time.1 <- proc.time() - time.1 statement.2 <- "SELECT * FROM H LEFT JOIN gene2refseq R ON R.`RNA_nucleotide_accession.version` LIKE '%' || H.NM || '%' LEFT JOIN gene_info I ON I.GeneID = R.GeneID" time.2 <- proc.time() x2 <- DBI::dbGetQuery( conn=conn, statement=statement.2) time.2 <- proc.time() - time.2 DBI::dbDisconnect(conn) saveRDS(object=x1, file="ex1_x1.rds", compress="xz") saveRDS(object=x2, file="ex1_x2.rds", compress="xz") saveRDS(object=list("Time x1"=list(time.1), "Time x2"=list(time.2)), file="ex1_t.rds", compress="xz") I got these timings in the ex1_t.rds file: $`Time x1` user system elapsed 571.731 182.006 772.199 $`Time x2` user system elapsed 200.068 90.529 295.086 As you can see, statement.1 takes a lot longer to process compared to statement.2 ... When I add the rest of the 31 search terms, the difference gets a lot bigger like I pointed out initially, beyond the full hour vs. only a few minutes. Best, Rasmus
signature.asc
Description: PGP signature
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.