Hi folks, My team has built a service comprising 3 main parts, a web application and 2 long-running worker processes that process events from a message exchange. All of these components interact with the same database and use the same underlying Django "app" for ORM models (i.e. the 2 worker processes call django.setup() on initialization).
We've had some issues with the worker processes failing to recover in the face of DB connectivity issues. For example at one point Amazon restarted our DB (it's an RDS instance) and the workers started flailing, repeatedly raising the same exceptions despite the DB coming back online. Later on we discovered that we could fix this particular issue by calling django.db.connection.close() when this exception occurred (it happened to be InterfaceError); on the next attempt to interact w/ the DB Django would establish a new connection to the DB and everything would continue to work. More recently a new error occurred that caused a similar problem, leading us to speculate that we should do the same thing in this case with this new type of exception (I think now it's OperationalError because the DB went into "recovery mode" or something). We are now planning on refactoring this service a bit so that instead of attempting to recover from exceptions, we'll just terminate the process and configure an external agent to automatically restart in the face of unexpected errors. This feels like a safer design than trying to figure out every exception type we should be handling. However I wanted to reach out to the Django group as a sanity check to see if we're missing something more basic. From browsing various tickets in Django's issue tracker I've gotten the impression that we may be swimming upstream a little bit as Django is designed as a web framework and relies on DB connections being closed or returned to a pool or something automatically at the end of the request cycle, not held open by a single loop in a long-running process. Is there something special we should be doing in these worker processes? A special Django setting perhaps? Should we just be calling connection.close() after processing each event? Should we not be using Django at all in this case? I think the pessimistic kill-and-restart strategy we've decided upon for now will work, but any guidance here to ensure we aren't fighting against our own framework would be much appreciated. Dan Tao -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/3c428718-af67-4beb-af20-36aaede71969%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.