On May 12, 12:47 pm, Phil Mocek <pmocek-list-django-us...@mocek.org>
wrote:
> On Tue, May 12, 2009 at 02:25:41AM -0700, Daniel Roseman wrote:
> > No, [get_or_create is] not atomic. You can see the code in
> > django.db.models.query - it tries a db lookup, and then creates
> > a new object if one is not found.
>
> It seems that this creates a potentially-troublesome race
> condition.  Wouldn't the object creation fail if the object is
> created by some other process between the time of the query and
> the time of the attempt at creation?  Shouldn't any operation that
> relies upon the results of a database query require that a read
> lock (i.e., a shared or non-exclusive lock) be placed before that
> initial query and held until the operation has successfully
> completed?
>
> This is rather disturbing.  Are there other instances of Django's
> ORM doing things that are not safe for concurrent database access?
> It seems that this would be a serious hindrance to scalability.
>
> I see some prior discussion [1] of this issue, but the only
> rationale for the present behavior I see in that discussion is
> James Bennett's assertion that in order to avoid deadlock, a
> framework should not automatically lock a table.  There's no
> discussion of DBMS's that perform row-level locking, or of relying
> upon the DBMS's ability to detect deadlock and proceed in an
> appropriate manner.  It seems risky to try to handle at the
> application level these things that a DBMS is specifically
> designed to do.
>
> Earlier in the aforementioned thread, Thomas Steinacher quoted
> Django documentation [2] as stating:
>
> > It works like this: When a request starts, Django starts a
> > transaction. If the response is produced without problems,
> > Django commits any pending transactions. If the view function
> > produces an exception, Django rolls back any pending
> > transactions.
>
> Does this not result in get_or_create operating atomically?
>
> References:
>
> [1]: 
> <http://groups.google.com/group/django-developers/browse_thread/thread...>
> [2]: <http://docs.djangoproject.com/en/1.0/topics/db/transactions/>
>
> --
> Phil Mocek

If you're only expecting one object with a given criteria, then you
should have some sort of unique or unique together set-up. If the field
(s) you're querying on are unique (unique_together for multiple
fields) then get_or_create will give exactly the results you're
looking for.

The steps for get_or_create are
1. Try to get the object
2. If getting fails, try to create it
3. If an IntegrityError is raised, return the object.

IntegrityErrors are raised when you attempt to create an object that
violates one of the table's unique constraints. Thus if the fields
you're querying on aren't unique, the integrity error will never be
raised and the duplicate will be successfully created.

I can't think of many situations where you'd be calling get() on a non-
unique row, and now that i'm typing it out it doesn't even make sense.

The current way scales well too, as locking tables and etc would cause
problems if 10 threads were running the code at the same time, and
seeing as most use cases involve querying on unique rows, those 10-
threads-calling-get-or-create-at-the-same-time sites aren't having
this problem. Locking tables would cause more problems for the 90% of
sites than it would fix for the 10%.

If you really, really need to do it the way you're doing now, then
having your own mutex is always an option, and doesn't require
screwing over the most common use case.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to