Dear r-help readers,

why is it so much slower to query an 
sqlite database using RSQlite «from the 
outside» using param like 

        statement <-
         "SELECT * FROM gene2refseq
          LEFT JOIN gene_info ON
            gene_info.GeneID = gene2refseq.GeneID
          WHERE gene2refseq.`RNA_nucleotide_accession.version` 
            LIKE ?"
        db <- "gene_info.sqlite"
        conn <- DBI::dbConnect(RSQLite::SQLite(), db)
        x1 <- DBI::dbGetQuery(conn=conn, 
          statement=statement, 
          param=list(Håkan20210914$RNANucleotideAccession))

compared to querying «from the inside» 
of sqlite, by writing your search terms 
as a table first, and then calling it 

        statement <- 
         "SELECT * FROM H
          LEFT JOIN gene2refseq R ON
            R.`RNA_nucleotide_accession.version` 
            LIKE '%' || H.RNANucleotideAccession || '%'
          LEFT JOIN gene_info I ON I.GeneID = R.GeneID"
        DBI::dbWriteTable(conn, "H", Håkan20210914)
        x2 <- DBI::dbGetQuery(conn=conn, statement=statement)
        DBI::dbDisconnect(conn)

On my system (E5-2603 v4), the first 
query took more than an hour, while the 
second took only a few minutes ...  

Do you guys know of any faster (but also 
nice) way to dig around in very large 
tsv files like 
https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz 
and 
https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz 
?

Best,
Rasmus

Attachment: signature.asc
Description: PGP signature

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to