**tl;dr** I can run the full test suite in 85 seconds on SQLite, a 4.8x speedup.


Hello,

Since I last wrote about this project, I improved parallelization by:

- reworking the IPC to avoid exchanging tracebacks
- implementing database duplication for SQLite, PostgreSQL and MySQL

The code is still rough. Several options of runtests.py don't work with
parallelization.

Initially performance was disappointing. I couldn’t max out my cores during
the whole run. With some basic monitoring I noticed that the CPU load
plummeted when there were spikes of disk writes. This led me to believe I was
disk I/O bound even with an in-memory database and a SSD.

So I started optimizing disk I/O, which means doing less I/O and doing it only
in a RAM-mounted temporary directory. Writing to RAM instead of writing to
disk helps a lot. It’s as simple as creating a RAMdisk (that depends on your
OS) and pointing the TMPDIR environment variable to the RAMdisk.

Unfortunately, i18n and migrations management commands write in the
application directories. I haven't found (yet) a way to point them to a
temporary directory instead.

My 2012 MacBook Pro with a 2.3 GHz Intel Core i7 (4 cores, 8 threads) takes:

- 30 seconds for creating the two databases

- 240 seconds to run the actual tests in a single process
- 72 seconds in 4 processes (3.3x faster)
- 60 seconds in 6 processes (4x faster)
- 55 seconds in 8 processes (4.4x faster)

That looks quite close to what my hardware can do. Hyperthreading doesn't help
as much as multiple cores when it comes to running multiple processes and the
synchronization costs increase with the number of processes.

Creating the database accounts for more than a third of the total runtime. I'm
not sure how much time is spent in the migrations framework and how much doing
the table creations. --keepdb helps but it requires an on-disk database. An
on-RAMdisk database would most likely be the best option. I haven't tried it
yet. It should work with at least SQLite and PostgreSQL (using tablespaces).

If you want to help, I'd be interested in:

- reports of whether parallelization works for test suites other than Django's
  own -- apply my pull request and run `django-admin test --parallel` or
  `django-admin test --parallel-num=N`
- a patch implementing database duplication on Oracle

Let me know if you have questions or concerns.

-- 
Aymeric.



> On 7 févr. 2015, at 10:42, Aymeric Augustin 
> <[email protected]> wrote:
> 
> On 7 févr. 2015, at 01:02, Russell Keith-Magee <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> I've thought about (but never done anything) about this problem in the past 
>> - my thought for this problem was to use multiple test databases, so you 
>> have isolation. Yes this means you need to do more manual setup (createdb 
>> test_database_1; createdb test_database_2; etc), but it means you don't have 
>> any collision problems multiprocessing an on-disk database.
> 
> The fastest way to do this is probably to create a database then clone it. 
> Each backend would have to implement a duplication method:
> 
> - SQLite: os.copy(‘django_test.sqlite3’, ‘django_test_N.sqlite3’)
> - PostgreSQL: CREATE DATABASE django_test_N WITH TEMPLATE django_test OWNER 
> django_test;
> - MySQL: mysqldump … | mysql …
> - Oracle: apparently there’s a DUPLICATE command — I have a bad feeling about 
> this one.
> 
> For optimal speed, this feature should support --keepdb.
> 
>> My only "concern" relates to end-of-test reporting - how are you reporting 
>> test success/failure? Do you get a single coherent test report at the end? 
>> Do you get progress reporting, or just "subprocess 1 has completed; 5 
>> failures, 200 passes" at the end of a subprocess? My interest here isn't 
>> strictly about Django - it's about tooling, and integration of a 
>> parallelized test suite with IDEs, or tools like Cricket.
> 
> Yes, I have a clean report. If you look at the pull request you’ll see two 
> successive implementations:
> 
> 1) Run tests in workers, pass “events" back to the master process, feed them 
> to the master test runner. Unfortunately this technique is never going to be 
> sufficiently robust because it involves passing tracebacks between processes 
> and tracebacks aren’t pickleable in general.
> 
> 2) Run tests in worker, buffer the output in workers, have the master test 
> runner reconstruct proper output. This is less elegant. It only works with 
> the TextTestRunner because it depends heavily on its internals. But it’s more 
> robust and it suffices for running Django’s test suite.
> 
> What this really means is — you have to choose between being a proper 
> unittest2 runner or being robust. That’s what I meant when I said the 
> unittest2 APIs made the implementation painful.
> 
> -- 
> Aymeric.
> 
> 
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/3ABF52D9-2695-40D8-B6E0-CEE7D0AEA5D1%40polytechnique.org.
For more options, visit https://groups.google.com/d/optout.

Reply via email to