Adam Tomjack added the comment:

The proposed patches don't fix the problem.  They may well allow DDL in 
transactions, but that's not the real problem.

A database library must *NEVER* implicitly commit or rollback.  That's 
completely insane.

    >>> import this
    ...
    Explicit is better than implicit.

If a statement can't be executed in a transaction, then trying to execute such 
a statement in a transaction is an error.  Period.  An exception *MUST* be 
raised.  It is wrong to guess that the user really wanted to commit first.

    >>> import this
    ...
    Errors should never pass silently.
    ...
    In the face of ambiguity, refuse the temptation to guess.

If such an exception is raised, you'll run into the problem exactly once before 
figuring out that you need to commit or rollback first.  You can then change 
your code to safely and explicitly account for that limitation.

This issue is not an enhancement, it's a bug.  One might argue that fixing it 
will break existing code, but it won't.  Any affected code is already broken.  
It's depending on a promised level of safety, and this bug is breaking that 
promise.  It's better to fail noisily and consistently than to usually work, 
but sometimes silently corrupt data.

As is, it's pretty much guaranteed that there is existing code that does DDL 
expecting to be in a transaction when it isn't.  Even a well-tested schema 
upgrade can fail in production if the data is slightly different that in a 
testing database.  Unique constraints can fail, etc.  I frequently use the 
psycopg2 module and psql command, which get this issue right.  They've saved me 
several times.

The current behavior is buggy and should be changed to be correct.  There 
should not merely be an option to enable the correct behavior.  There should 
also be no option to enable the old buggy behavior.  It's too dangerous.

In general, it's a code-smell to inspect the SQL that's being executed before 
sending it to the database.  The only reason I can think of to inspect SQL is 
to work around brokenness in the underlying database.  Suppose, for example, 
that SQLite implicitly committed when it got DDL.  (I don't know what it 
actually does.  Just suppose.)  In that case, it would be necessary to check 
for DDL and raise an exception before passing the statement to the database.  
If the database correctly errors in such a situation, then the python wrapper 
should just pass the DDL to the database and deal with the resulting error.  
That's way easier that trying to correctly detect what the statement type is.  
Parsing SQL correctly is hard.

As evidence of that, note that the existing statement detection code is broken: 
it doesn't strip comments first!  A simple SELECT statement preceeded by a 
comment will implicitly commit!  But commenting is good.

    >>> import this
    ...
    Readability counts.

So it's not even just an issue of DDL or PRAGMAs!

Anything that causes the underlying database to implicitly commit must be 
avoided (yet the python module goes out of it's way to add *MORE* such buggy 
behavior!)  Fixing brokenness in the underlying database is the only reason to 
inspect SQL as it passes through this module.  If there are other ways to avoid 
such problems, they will likely be better than inspecting SQL.

Anything which causes implicit rollbacks in the underlying database is a 
similar issue, but easier to handle without parsing SQL.  It's not safe to 
implicitly rollback, even if a new transaction is begun.  For example, you 
might be doing foreign key validation outside the database.  If one INSERT were 
implicitly rolled back and a new transaction begun, a second INSERT may then 
succeed and be committed.  Now you have a constraint violation, even though you 
correctly checked it before.

If there is anything that triggers an implicit rollback in the underlying 
database connection, those rollbacks must be dectected all subsequent 
statements or actions on the python wrapper *should* raise exceptions until an 
explicit rollback is performed, except for commit() which *must* raise.

For example:

    con.isolation_level = 'DEFERRED'
    try:
        # Nothing in here can implicitly commit or rollback safely.  
        # Doing so risks data corruption.

        cur.execute("INSERT INTO foo ...")

        try:
            con.somehow_trigger_implicit_rollback() # must raise
        except:
            # Throw away the exception.
            pass

        try:
            # This *should* raise too, but isn't strictly necessary,
            # as long as commit() is implemented correctly.
            cur.execute("INSERT INTO bar ...")
        except:
            pass
        
        # The underlying database rolled back our transaction.
        # A successful commit here would not match reality.
        # This commit() must raise.
        con.commit()
    except:
        # This rollback() can succeed, because it puts the 
        # "con" object back into correspondance with the underlying
        # database connection state.
        con.rollback()
        logger.error('something bad happened')
        raise

If nothing in the database does implicit COMMITs or ROLLBACKs, then it's 
probably sufficient to leave all the error handling to the database and only 
raise exceptions when it indicates problems.

Here is an example demonstrating that a SELECT can trigger an implicit commit.

    Python 2.7.3 (default, Sep 26 2013, 20:03:06)
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
    >>> # Create a connection.
    ... import sqlite3
    >>> con = sqlite3.connect(":memory:", isolation_level='DEFERRED')
    >>> cur = con.cursor()
    >>>
    >>> # Create some state.
    ... cur.execute("CREATE TABLE foo (i int);")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> cur.execute("INSERT INTO foo VALUES (0);")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.commit()
    >>>
    >>> # Change the state, do a SELECT, then rollback.
    ... cur.execute("UPDATE foo SET i=1")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> cur.execute("SELECT 1;")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.rollback()
    >>>
    >>> # Verify that the rollback worked.
    ... cur.execute("SELECT i FROM foo")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> i = cur.fetchone()[0]
    >>> print 'i is', i
    i is 0
    >>> assert i == 0
    >>>
    >>> # Change some state, do a different SELECT, then attempt to rollback.
    ... cur.execute("UPDATE foo SET i=2")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> # This will incorrectly commit:
    ... cur.execute("""
    ...     /*
    ...      * Comments are good for readability, but because of this bug, they 
can
    ...      * cause incorrect implicit commits.
    ...      */
    ...     SELECT 1;
    ... """)
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> con.rollback()
    >>>
    >>>
    >>> # The rollback succeeded and rolling back a different transaction than
    >>> # the one we expected.
    ... cur.execute("SELECT i FROM foo")
    <sqlite3.Cursor object at 0x7fc1b7d51500>
    >>> i = cur.fetchone()[0]
    >>> print 'i is', i
    i is 2
    >>> assert i == 0
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    AssertionError

----------
nosy: +adamtj
title: sqlite3 module should allow DDL statements in transactions -> sqlite3 
module breaks transactions and potentially corrupts data
type: enhancement -> behavior
versions: +Python 2.7, Python 3.1, Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10740>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to