On Nov 10, 2011, at 5:33 PM, pallav wrote: > Thank you. I also appreciate you putting your app.yaml on the code > repository - it helped me get started today (before seeing what you > had done, my own had not been customized for wsgi, just Python 2.7). > > I'm definitely interested in learning the relationship between the > DAL, 2.7, and transactions. I would like to help improve GAE support > in the DAL, so if there is anything I can learn from your experiences, > maybe we can use it to improve web2py code. I believe this is the > killer feature of web2py - being able to deploy to GAE easily while > still being quick to start, and portable. > > Any interest?
Yes. The underlying SQL transaction mechanisms (caveat: I'm no expert on this subject) are select for update and begin transaction. The Google Datastore has neither of these as such. What it does have is run_in_transaction(), to which you pass a function, and it runs that function as a transaction. The Datastore doesn't lock, at least not at a level that's visible to us. The way run_in_transaction() works is that if two such transactions collides, one of them succeeds and the other is aborted, but automatically retried. So whatever you put in run_in_transaction() must be idempotent and moreover not have external side effects. (Side note: it seems to me that this makes the relationship between GAE transactions and memcache problematic, since I assume that memcache puts inside a transaction are not visible to the transaction logic, and therefore constitute an undesirable side effect, leading to possible inconsistencies between the cache and the Datastore.) Python 2.7 complicates this because Google has decreed that the use of 2.7 requires the use of the High Replication Datastore. The HR Datastore says that any query in a transaction must be an ancestor query, and the DAL doesn't know anything about ancestors and entity groups. So what do you do if you (like me) want to use GAE with 2.7 and need transaction support? I see three possibilities. 1. Bypass the DAL and use the Datastore API directly. I have a very simple model that maps into the Datastore entity-group model quite naturally, so that's what I did. 2. Use MySQL instead of the Datastore (assuming that this works from 2.7): http://googleappengine.blogspot.com/2011/10/google-cloud-sql-your-database-in-cloud.html 3. Teach the DAL enough about entity groups that it can support transactions. More about option (3) follows. What I have in mind is a decorator for a function in your controller that you want run as a transaction. In the GAE case, it'd use run_in_transaction() to wrap the function; in the standard SQL case it'd execute BEGIN TRANSACTION (in whatever flavor the db requires) before calling the function. This won't work today, partly because we don't have portable BEGIN TRANSACTION support, and partly because our GAE support doesn't have entity groups (I think). I propose to address *that* somewhat crudely, by creating an entity that represents the database, and is the parent of a second-level entity that represents each table. A table entity would be the parent of all its rows. Then any query within a single table would use the table entity as the ancestor, and a query that used >1 table would use the db entity as the ancestor. Three caveats: I don't actually know how the DAL maps SQL to the Datastore. GQL, maybe? So I don't know how well the structure I've described could be supported by the DAL. Second, I'm enough of a novice using the Datastore that I might be missing something about its mechanisms that could make this more or less difficult than I'm implying. Third, I'm implying, effectively, something like table-level or database-level locking; that could have performance issues. (Maybe each row should have its own parent entity, so we can "lock" a row as well.) Last: I've addressed my own situation by bypassing the DAL, and while I'm more than happy to participate in the discussion, I don't have the time or expertise to contribute much to the implementation of (3).