[EMAIL PROTECTED] wrote: > On Jun 2, 5:06 pm, Peter Otten <[EMAIL PROTECTED]> wrote: > >> You are taking the wrong approach here. >> >> Don't build SQL statements as strings; you are enabling the next SQL >> injection attack. Pass parameters using the DB API instead. >> >> Don't use regular expressions to parse a CSV file. Python's csv module is >> more likely to deal correctly with the quirks of that standard. >> > > I'd like to second both these statements. Regardless of whether these > CSV files are from a trusted source or not, it's a virtual truism of > programming that eventually, any application will be used in ways it > was not intended. Since using a parameterized query is a simple way > to avoid a common security hole, even if such a thing could never be > exploited by the app in its current configuration, you should do > things the Right Way. That way, even if your code is twisted to some > other use in the future, it's less likely to cause problems.
I don't have a problem with a response saying "it's a good idea to use parameterized queries and explaining why", but I have seen way too many responses like this which are basically FUD. I'm not sure what a "virtual" truism is, but if it is like a truism, it's not true. There are many cases where one can accurately predict that the code will not be used in the future in some different app. I don't know what the OP was doing, but I have done many data conversion jobs where I have done things similar to the OP. The jobs were one-time, move data from system A to system B (with some munging in between) and I could and did predict the conversion code would not get reused. My accuracy rate is 100%. And if you do reuse code where you feed it untrusted input in a security sensitive context, and you don't bother to verify the security of said, code, you already have so many problems, one more probably won't make much difference. To the OP: The advice to use parameterized queries is good but overstated. There are cases when it is quite safe to use non-parameterized statements: * When you control the data going into the query )e.g., you've generated it yourself). * When the data come from trusted sources (including something like sys.input if the only people with access to the program are trusted). * When you can reliably check the data yourself, for example in: sql = "SELECT * FROM foo WHERE id=%d" % int(some_string) cursor.execute (sql) it doesn't really matter what "some_string" contains (if you are prepared for a Python exception). But note that checking and escaping strings in more general or complicated cases can be quite tricky.) In most cases a good reason to use a parameterized query is that it is no harder than to not use one, so why not and get the additional safety for free? A parameterized query can often run faster than a non-parameterized one since the database can reuse the cached compiled query. (But sometimes the opposite is true, see below). A parameterized form of the above is: sql = "SELECT * FROM foo WHERE id=?" % int(some_string) cursor.execute (sql, int(some_string)) so it is just as easy. There are times though when it is slightly harder. If idnums is an arbitrary list of ints: sql = "SELECT * FROM foo WHERE id IN(%s) % ','.join(idnums) cursor.execute (sql) Using a parameterized query might look like: sql = "SELECT * FROM foo WHERE id IN(%s) % ','.join(['?']*len(idnums)) cursor.execute (sql, idnums) When you have written such code a few times it becomes a natural idiom, but if you only do so occasionally, you are in a hurry, and the conditions above apply, then there is no reason not to go with the first form. And if you you already have a text string of comma-separated digits, the ease of using the direct sql form becomes even greater: sql = "SELECT * FROM foo WHERE id IN(%s) % idnums_string cursor.execute (sql) But of course, if "idnums_strings" came from an untrusted source, then you need to validate it first, e.g.: if idnums_string.strip("0123456789 ,"): then raise Error There are also times when using a parameterized query can dramatically (and I mean two or three *orders of magnitude*) slow down your query when using prepared queries. For example: sql = "SELECT * FROM foo WHERE txt LIKE "%s%%" % some_string cursor.execute (sql) can be expected to run quite quickly in most database systems, since the database knows that the searched for text starts with a constant string and can thus use an index. The parameterized form: sql = "SELECT * FROM foo WHERE txt LIKE ?" cursor.execute (sql, [some_string + "%"]) will often run very very slowly, because when the query is prepared the database has no idea if the argument will start with a constant string of not, and thus can only assume the worst, and prepare the query so that it doesn't use an index. The bottom line is, as in all things, understanding the issues will lead to much better decisions than blindly following some dumbed down advice like, "always use parameterized queries". (And to the OP. if you already know all this, my apologies if I sound like I'm talking down to you, but perhaps other people may benefit. I get tired of reading simplistic "do X, period." responses sometimes.) -- http://mail.python.org/mailman/listinfo/python-list