On Tue, Jul 1, 2014 at 9:26 PM, Adam Funk <a24...@ducksburg.com> wrote: > cursor.execute('SELECT filename FROM files WHERE filename IS ?', > (filename,))
Shouldn't this be an equality check rather than IS, which normally I'd expect to be "IS NULL" or "IS NOT NULL"? As to your actual question: Your two database lookups are doing distinctly different things, so there's no surprise that they perform very differently. B asks the database "Do you have this? Do you have this?" for every file you have, and C asks the database "What do you have?", and then comparing that against the list of files. By the way - the A+C technique could be done quite tidily as a set difference: # assume you have listing1 and cursor set up # as per your above code listing = {os.path.join(directory, x) for x in listing1} cursor.execute(...) # as per above known_files = {row[0] for row in cursor} # cursors are iterable needed_files = listing - known_files cursor.executemany('INSERT INTO files VALUES (?, ?)', ((filename, 0) for filename in needed_files)) Anyway. The significant thing is the performance of the database on two different workloads: either "give me everything that matches this pattern" (where the pattern ends with a percent sign), or "do you have this? do you have this? do you have this?". Generally, database indexing is fairly efficient at handling prefix searches, so the first query will basically amount to an index search, which is a lot faster than the repeated separate searching; it takes advantage of the fact that all the strings you're looking at will have the same prefix. There is one critical consideration, though. What happens if the directory name contains an underscore or percent sign? Or can you absolutely guarantee that they won't? You may need to escape them, and I'm not sure how SQLite handles that. (Possibly \_ will match literal _, and \\ will match literal \, or something like that.) This is not bypassing the database's optimization; in fact, it's working tidily within it. If you want to express your logic in a way that lets the database truly be in command of optimization, it would look something like this: INSERT INTO files SELECT filename,0 FROM (VALUES('foo'),('bar'),('quux'),('spam')) AS needed(filename) EXCEPT SELECT filename,0 FROM files And you'd need to check the SQLite docs for how to define a table that consists of literal values. (The above syntax works in PostgreSQL; I'm pretty sure it's not all standard SQL.) But doing the set difference in Python is just as good a way of doing the job. ChrisA -- https://mail.python.org/mailman/listinfo/python-list