Hi. I have a web application that requires files on disk to be analyzed and imported into the database continually. I spent the last couple days parallelizing the import routine. At first, I'd periodically lose my connection to MySQL:
-------- Traceback (most recent call last): File "/Users/sobelk/Code/motor/dispatch/dispatcher.py", line 327, in <module> dispatcher.run() File "/Users/sobelk/Code/motor/dispatch/dispatcher.py", line 117, in run num_tasks = self.run_multiprocess() File "/Users/sobelk/Code/motor/dispatch/dispatcher.py", line 160, in run_multiprocess results = pool.map(dispatch, tasks) File "/Library/Python/2.5/site-packages/multiprocessing-2.6.1.1- py2.5-macosx-10.5-i386.egg/multiprocessing/pool.py", line 148, in map return self.map_async(func, iterable, chunksize).get() File "/Library/Python/2.5/site-packages/multiprocessing-2.6.1.1- py2.5-macosx-10.5-i386.egg/multiprocessing/pool.py", line 422, in get raise self._value _mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query') -------- I know it's not the same error as you're experiencing, but I presume the cause is also that my DB was shared among multiple processes. So, I looked at django/db/__init__.py to see how the connection object is generated. There is a 'close_connection' function, but no 'reset' or 'start' equivalent. There are hooks in place to execute certain code pertaining to the connection when processes are stopped. So, I just copied the lines that take care of that stuff into my own code. Here is the jist of it (tasks are Django models): -------- def dispatch(task): try: task.run() except Exception, e: # handle ... return () # some result class Dispatcher(object): PROCESSES = 2 def initialize_process(self): from os import getpid print "Initializing process #%d" % getpid() self.reset_connection () def reset_connection(self): import django.db from django.conf import settings django.db.close_connection() connection = django.db.backend.DatabaseWrapper({ 'DATABASE_HOST': settings.DATABASE_HOST, 'DATABASE_NAME': settings.DATABASE_NAME, 'DATABASE_OPTIONS': settings.DATABASE_OPTIONS, 'DATABASE_PASSWORD': settings.DATABASE_PASSWORD, 'DATABASE_PORT': settings.DATABASE_PORT, 'DATABASE_USER': settings.DATABASE_USER, 'TIME_ZONE': settings.TIME_ZONE, }) django.db.connection = connection def run_multiprocess(self): import multiprocessing pool = multiprocessing.Pool(self.PROCESSES, self.initialize_process) tasks = self.get_process_tasks() while tasks: print "Starting batch of %d tasks" % num_tasks # distribute tasks among processes results = pool.map(dispatch, tasks) # analyze the results ... # load a new batch of tasks tasks = self.get_process_tasks() -------- So, recreate the connection for each process before any data is sent through it. I haven't had any problems since -- not to say I won't :) I'd love to hear any feedback on whether this is a viable solution. I agree with you, mjt, that something like a 'reset_connection' function would be lovely, just to cut down on code redundancy. You said earlier that you had traced the home of the connection to the base model. I'm not sure what you mean by that -- could you explain the comment? Best, Kieran On May 23, 3:46 pm, "m...@nysv.org" <markus.tornqv...@gmail.com> wrote: > On May 23, 11:57 am, "m...@nysv.org" <markus.tornqv...@gmail.com> > wrote: > > > Maybe I'll still try Process objects instead of processing.Pool.imap() > > to be safe.. > > I never did, as it seems the connection is very global. > > I also made a finding that the connection live in the base model, so > although > > from django.db import connection > new_connection = connection.__class__() > > works as anticipated, I can't assign it for only some objects. > > Running out of time I just wrote all the SQL manually and opened new > connections for them manually. > > If anyone still has a better solution, I'm all ears, but it would be > nice > if Django had better support for this :) > > Thanks! > > -- > mjt --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---