Hi.  I have a web application that requires files on disk to be
analyzed and imported into the database continually.  I spent the last
couple days parallelizing the import routine.  At first, I'd
periodically lose my connection to MySQL:

--------
Traceback (most recent call last):
  File "/Users/sobelk/Code/motor/dispatch/dispatcher.py", line 327, in
<module>
    dispatcher.run()
  File "/Users/sobelk/Code/motor/dispatch/dispatcher.py", line 117, in
run
    num_tasks = self.run_multiprocess()
  File "/Users/sobelk/Code/motor/dispatch/dispatcher.py", line 160, in
run_multiprocess
    results = pool.map(dispatch, tasks)
  File "/Library/Python/2.5/site-packages/multiprocessing-2.6.1.1-
py2.5-macosx-10.5-i386.egg/multiprocessing/pool.py", line 148, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/Library/Python/2.5/site-packages/multiprocessing-2.6.1.1-
py2.5-macosx-10.5-i386.egg/multiprocessing/pool.py", line 422, in get
    raise self._value
_mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL
server during query')
--------


I know it's not the same error as you're experiencing, but I presume
the cause is also that my DB was shared among multiple processes.

So, I looked at django/db/__init__.py to see how the connection object
is generated.  There is a 'close_connection' function, but no 'reset'
or 'start' equivalent.  There are hooks in place to execute certain
code pertaining to the connection when processes are stopped.  So, I
just copied the lines that take care of that stuff into my own code.

Here is the jist of it (tasks are Django models):

--------

def dispatch(task):
    try:
        task.run()
    except Exception, e:
        # handle ...
    return () # some result


class Dispatcher(object):
    PROCESSES = 2

    def initialize_process(self):
        from os import getpid
        print "Initializing process #%d" % getpid()
        self.reset_connection
()

    def reset_connection(self):
        import django.db
        from django.conf import settings
        django.db.close_connection()
        connection = django.db.backend.DatabaseWrapper({
                'DATABASE_HOST': settings.DATABASE_HOST,
                'DATABASE_NAME': settings.DATABASE_NAME,
                'DATABASE_OPTIONS': settings.DATABASE_OPTIONS,
                'DATABASE_PASSWORD': settings.DATABASE_PASSWORD,
                'DATABASE_PORT': settings.DATABASE_PORT,
                'DATABASE_USER': settings.DATABASE_USER,
                'TIME_ZONE': settings.TIME_ZONE,
                })
        django.db.connection = connection

    def run_multiprocess(self):
        import multiprocessing

        pool = multiprocessing.Pool(self.PROCESSES,
self.initialize_process)

        tasks = self.get_process_tasks()
        while tasks:
            print "Starting batch of %d tasks" % num_tasks

            # distribute tasks among
processes
            results = pool.map(dispatch, tasks)

            # analyze the results ...
            # load a new batch of
tasks
            tasks = self.get_process_tasks()
--------

So, recreate the connection for each process before any data is sent
through it.  I haven't had any problems since -- not to say I
won't :)  I'd love to hear any feedback on whether this is a viable
solution.

I agree with you, mjt, that something like a 'reset_connection'
function would be lovely, just to cut down on code redundancy.  You
said earlier that you had traced the home of the connection to the
base model.  I'm not sure what you mean by that -- could you explain
the comment?


Best,

Kieran



On May 23, 3:46 pm, "m...@nysv.org" <markus.tornqv...@gmail.com>
wrote:
> On May 23, 11:57 am, "m...@nysv.org" <markus.tornqv...@gmail.com>
> wrote:
>
> > Maybe I'll still try Process objects instead of processing.Pool.imap()
> > to be safe..
>
> I never did, as it seems the connection is very global.
>
> I also made a finding that the connection live in the base model, so
> although
>
> from django.db import connection
> new_connection = connection.__class__()
>
> works as anticipated, I can't assign it for only some objects.
>
> Running out of time I just wrote all the SQL manually and opened new
> connections for them manually.
>
> If anyone still has a better solution, I'm all ears, but it would be
> nice
> if Django had better support for this :)
>
> Thanks!
>
> --
> mjt

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to