Re: select_for_update().get(...), what happens on DoesNotExist?

Chris Cogdon Thu, 13 Dec 2012 10:40:02 -0800


On Thursday, December 13, 2012 10:23:52 AM UTC-8, Carsten Fuchs wrote:
>
> Our database is Oracle; 
> with 
> <
> https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-for-update>
>  
>
> saying both "Returns a queryset that will lock rows ..." and "All 
> matched entries will be locked ..." I nourished the silent hope that 
> maybe it kept and matched the query rather than flagging individual 
> rows.  ;-) 
>


No nourishment, but fortunately no calories either :)
 

>
> > However, not all is lost here. 
> > 
> > When you create new rows, those rows are _invisible_ to any other 
> > transaction until you do a commit. So, as long as you arrange things so 
> > that there's a final commit after you've done all the manipulations you 
> > want, you won't have those rows stomped on by any other transaction. 
>
> Ok, thanks, I was in fact not quite aware that the newly created rows 
> are invisible until the end of the transaction. 
>
> However, that still doesn't stop a second process that enters the same 
> code while the first is still running, from doing the same computations, 
> too, eventually committing a transaction that creates (attempts to) 
> another row with the same data again, doesn't it? 
>
> Is it a sound idea to work-around this problem by running a 
> select_for_update() query on *other* rows (a few that are known to 
> exist, or even all) of the same model, just to obtain a dummy lock until 
> the first process is done? 
>
> Or even better, lock the single row in another model that tells us that 
> meets_requirements() == False, i.e. our cache needs updating? It would 
> still only be a "dummy" lock, of course, so I'm unsure if this is 
> reasonable at all. 
>
> Or does some kind of best practice exist that suggests how such a 
> problem is normally solved? 
>

This will depend on the problem, but let me site a simple example for how 
this is often solved.

Lets say your process is to read in rows matching a certain criteria, and 
write a new row somewhere else. Since you say you cant guarantee that, 
during this process, another process might kick off that will try to do the 
same thing, here's the avoidance strategies:

1. As part of the process, the "input rows" are changed so that they no 
longer match the criteria. This means that you're putting a "select for 
update" on those rows and that will stop a second process trying the same 
row-level lock. The second process will block until the first 
completes/commits, and when unblocked those rows no longer match the 
criteria, and are no longer returned in the query.

This sounds most like what you're doing now.


2. The output rows have a unique constraint put on them, so if the same 
input rows are processed, the INSERT will fail. In this case a) you'll need 
to make sure you trap the exception and handle it gracefully AND b) ensure 
Django isn't going to undo your work and use an UPDATE instead

This is generally only feasible if you can ensure that there's a consistent 
mapping between output and input rows


3. Make the whole process "idempotent"... meaning that you can re-run it 
any number of times and it will have no further impact on your data model.

Eg: if your input is "all data in time range :00-:05" and your output is a 
summary then all you're losing if you re-run it is a touch of performance. 
This wont work, of course, if your process is "summarise and delete source 
data", in which case you'll need to lock input rows AND possibly the output 
row, and also adjust your code so that if it thought it was creating an 
output row, but it suddenly exists, that you can switch over to updating it 
instead.



I hope that gives you sufficient options.







>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/django-users/-/17XNDiMj2-4J.
To post to this group, send email to django-users@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.

Re: select_for_update().get(...), what happens on DoesNotExist?

Reply via email to