Dear r-help readers, why is it so much slower to query an sqlite database using RSQlite «from the outside» using param like
statement <- "SELECT * FROM gene2refseq LEFT JOIN gene_info ON gene_info.GeneID = gene2refseq.GeneID WHERE gene2refseq.`RNA_nucleotide_accession.version` LIKE ?" db <- "gene_info.sqlite" conn <- DBI::dbConnect(RSQLite::SQLite(), db) x1 <- DBI::dbGetQuery(conn=conn, statement=statement, param=list(Håkan20210914$RNANucleotideAccession)) compared to querying «from the inside» of sqlite, by writing your search terms as a table first, and then calling it statement <- "SELECT * FROM H LEFT JOIN gene2refseq R ON R.`RNA_nucleotide_accession.version` LIKE '%' || H.RNANucleotideAccession || '%' LEFT JOIN gene_info I ON I.GeneID = R.GeneID" DBI::dbWriteTable(conn, "H", Håkan20210914) x2 <- DBI::dbGetQuery(conn=conn, statement=statement) DBI::dbDisconnect(conn) On my system (E5-2603 v4), the first query took more than an hour, while the second took only a few minutes ... Do you guys know of any faster (but also nice) way to dig around in very large tsv files like https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz and https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz ? Best, Rasmus
signature.asc
Description: PGP signature
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.