On Thursday, December 13, 2012 10:23:52 AM UTC-8, Carsten Fuchs wrote: > > Our database is Oracle; > with > < > https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-for-update> > > > saying both "Returns a queryset that will lock rows ..." and "All > matched entries will be locked ..." I nourished the silent hope that > maybe it kept and matched the query rather than flagging individual > rows. ;-) >
No nourishment, but fortunately no calories either :) > > > However, not all is lost here. > > > > When you create new rows, those rows are _invisible_ to any other > > transaction until you do a commit. So, as long as you arrange things so > > that there's a final commit after you've done all the manipulations you > > want, you won't have those rows stomped on by any other transaction. > > Ok, thanks, I was in fact not quite aware that the newly created rows > are invisible until the end of the transaction. > > However, that still doesn't stop a second process that enters the same > code while the first is still running, from doing the same computations, > too, eventually committing a transaction that creates (attempts to) > another row with the same data again, doesn't it? > > Is it a sound idea to work-around this problem by running a > select_for_update() query on *other* rows (a few that are known to > exist, or even all) of the same model, just to obtain a dummy lock until > the first process is done? > > Or even better, lock the single row in another model that tells us that > meets_requirements() == False, i.e. our cache needs updating? It would > still only be a "dummy" lock, of course, so I'm unsure if this is > reasonable at all. > > Or does some kind of best practice exist that suggests how such a > problem is normally solved? > This will depend on the problem, but let me site a simple example for how this is often solved. Lets say your process is to read in rows matching a certain criteria, and write a new row somewhere else. Since you say you cant guarantee that, during this process, another process might kick off that will try to do the same thing, here's the avoidance strategies: 1. As part of the process, the "input rows" are changed so that they no longer match the criteria. This means that you're putting a "select for update" on those rows and that will stop a second process trying the same row-level lock. The second process will block until the first completes/commits, and when unblocked those rows no longer match the criteria, and are no longer returned in the query. This sounds most like what you're doing now. 2. The output rows have a unique constraint put on them, so if the same input rows are processed, the INSERT will fail. In this case a) you'll need to make sure you trap the exception and handle it gracefully AND b) ensure Django isn't going to undo your work and use an UPDATE instead This is generally only feasible if you can ensure that there's a consistent mapping between output and input rows 3. Make the whole process "idempotent"... meaning that you can re-run it any number of times and it will have no further impact on your data model. Eg: if your input is "all data in time range :00-:05" and your output is a summary then all you're losing if you re-run it is a touch of performance. This wont work, of course, if your process is "summarise and delete source data", in which case you'll need to lock input rows AND possibly the output row, and also adjust your code so that if it thought it was creating an output row, but it suddenly exists, that you can switch over to updating it instead. I hope that gives you sufficient options. > -- You received this message because you are subscribed to the Google Groups "Django users" group. To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/17XNDiMj2-4J. To post to this group, send email to django-users@googlegroups.com. To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/django-users?hl=en.