I've been trying to solve this problem in Django and I've finally decided after 
over a year going down this path that there's no way to solve it in Django (in 
a database agnostic fashion). So I'm thinking that I need to explore a 
Postgres-specific solution. FYI, I'm not a database expert. I just know enough 
to get myself into trouble like this problem.

Let me try and distill all the back story down to an oversimplified explanation:

I created a scientific data submission validation interface that helps 
researchers compile their data submissions to our database. To do this, I 
decided to add a `--validate` option to the load scripts that essentially 
raises a dry run exception at the end of a load run so that the data is all 
rolled back before leaving the atomic transaction block.

This validation interface skips the raw data load step, which is the heftiest, 
most long running, part and generally finishes in well under a minute.

The interface works fantastically well. It rolls back problematic data in 
smaller transactions and buffers those errors for a final report that the user 
can work to fix in their data files and retry validation until all their 
problems are solved and the data is ready to load.

The problem is that if we have a long running load going on in the background 
(which can take hours) and during that load, a user tries to validate a new 
submission that contains some overlapping common data (e.g. a new tissue type 
or protocol that is currently being loaded) or perhaps they are trying to 
validate data added to an existing submission that is being partially loaded, 
that validation process gets blocked and the validation interface encounters a 
gateway timeout.

I had tried changing the isolation level to both repeatable read and 
serializable, but the hang can still occur (though serializable seems to avoid 
hangs in some cases that repeatable read does not).

My initial interpretation of the isolation level documentation was that 
transactions would not be aware of what is happening in other transactions and 
that if there was a problem, you would only ever get a serialization error at 
the end when the data is committed, but after encountering the hang yesterday, 
I realized that the "snapshot" that the second transaction gets includes locks 
already established by the first transaction.

I had hoped that since my validation interface would never commit, it would 
never have a serialization error or hang.

What options are available that might allow this all to work as I intended? Is 
there some way to tell the validation process transaction to assume that any 
pending changes from another transaction will succeed?

I have lots of ideas on how to mitigate these occurrences, but I don't want to 
bias any potentially helpful responses. Am I screwed from my early decision to 
use the loading code to validate data submissions?

Reply via email to