On Sun, Jan 31, 2016 at 8:52 AM, Michael Torrie <torr...@gmail.com> wrote: > On 01/30/2016 02:19 PM, Chris Angelico wrote: >> Efficiency. That's a fine way of counting actual rows in an actual >> table. However, it's massive overkill to perform an additional >> pre-query for something that's fundamentally an assertion (this is a >> single-row-fetch API like "select into", and it's an error to fetch >> anything other than a single row - but normal usage will never hit >> that error), and also, there's no guarantee that the query is looking >> at a single table. Plus, SQL's count function ignores NULLs, so you >> could get a false result. Using count(*) might be better, but the only >> way I can think of to be certain would be something like: >> >> select count(*) from (...) > > True. The id field is usually the best, or some other indexed field.
Yeah, a primary key is always non-nullable. But to do that, you have to know that you're selecting from exactly one table - even what looks like a primary key can have duplicates and/or NULLs if it's coming from an outer join. It's not something you can do in a general way in a library. >> where the ... is the full original query. In other words, the whole >> query has to be run twice - once to assert that there's exactly one >> result, and then a second time to get that result. The existing >> algorithm ("try to fetch a row - if it fails error; then try to fetch >> another - if it succeeds, error") doesn't need to fetch more than two >> results, no matter how big the query result is. > > That is true, but that's what a database engine is designed for. Granted > he's just using SQLite here so many optimizations don't exist. Just > seems a bit odd to me to implement something in Python that the DB > engine is already good at. Guess ever since ORM was invented the debate > has raged over what the DB's job actually is. Personally I trust a DB > engine to be fast and efficient much more than my Python code will be > playing with the results. Again, the simple case is fine - I would be fairly confident that running the same query twice would be cheaper than twice the cost of running it once (although not all queries are perfectly stable, and some forms of query will defeat optimizations in surprising ways). But even if the query itself is fully optimized, just the action of running a query has a cost - you have to send something off to the server and wait for a response. A "fetchone" call is likely to be being used for queries that are already coming from the cache, so the main cost is just "hello server, I need this" - which is going to be doubled. ChrisA -- https://mail.python.org/mailman/listinfo/python-list