On Thu, Mar 6, 2014 at 4:52 AM, Andrew Farrell <armorsmit...@gmail.com>wrote:

> Привет Djanonaughts!
>
> tl;dr: django with buildout spends a lot of time looking for files that
> aren't there when trying to do imports. How can I reduce this?
>
>
>
> I've long been frustrated with the time it takes my django installation to
> run tests. Now, I'm aware that some ways to think about to how to write
> fast and efficient tests in 
> django<http://www.slideshare.net/cordiskinsey/djangocon-2013-how-to-write-fast-and-efficient-unit-tests-in-django>
>  include:
> - putting things in setUpClass rather than setUp
> - treating the database as hot lava (at least, when you don't have
> ManyToMany relations)
> - using sqlite3 in-memory
> - using mock to emulate instances
> - using nose to only run the tests you need
> But I want to talk about another component of test performance: Disk I/O
> syscalls
>
> My company (the startup accelerator MassChallenge) uses 
> buildout<http://www.buildout.org/en/latest/> to
> manage dependencies and the several apps modules within our appication. So
> I run tests from the root of the buildout by calling `bin/django test`, a
> script that looks like the following: (note the datetime printing)
> ```
> #!/home/afarrell/projects/masschallenge/venv/bin/python
> from datetime import datetime
> print datetime.now().isoformat()
> print "before test"
>
> import sys
> sys.path[0:0] = [
>
>   
> '/home/afarrell/projects/masschallenge/mc2013/eggs/MySQL_python-1.2.3-py2.7-linux-i686.egg',
>   ... 38 other paths to various eggs...
>   '/home/afarrell/projects/masschallenge/venv/lib/python2.7/site-packages',
>   '/home/afarrell/projects/masschallenge/mc2013', #the root of our app
>   ]
>
> import djangorecipe.manage
>
> if __name__ == '__main__':
>     sys.exit(djangorecipe.manage.main('mcproject.develop_settings'))
> ```
>
> I am in the habit of writing a single test as I am writing a method. To do
> this, I use nose's attrib 
> tag<http://nose.readthedocs.org/en/latest/plugins/attrib.html> in
> a test file that looks like this:
> ```
> from nose.plugins.attrib import attr
> import nose.tools as nt
> from django.test import TestCase
> from datetime import datetime
>
> class TestExample(Testcase):
>     @attr('now')
>     def test_trivial_case(self):
>         print datetime.now().isoformat()
>         print "test has begun"
>         nt.assert_equal(1 + 1, 2)
> ```
>
> I then run tests with `bin/django test mentor_match -a'now' -s` and this
> prints out the following:
> ```
> 2014-03-05T13:46:02.774142
> before test
> nosetests --verbosity 1 mentor_match -anow -s
> To reuse old database "test_02_06_18_29" for speed, set env var REUSE_DB=1.
> Creating test database for alias 'default'...
> 2014-03-05T13:46:28.612895
> test has begun
> ```
> As you can see, there are about 26 seconds from the test command is issued
> and when the first test actually runs. This delay seems small but it is
> noticeable and kinda annoying when that is the only test. You could load,
> present, and fire a musket in that time. There are two possible causes that
> I can think of:
> 1) The test database takes a long time to set up
> 2) The test command takes a long time to import all required modules
>
> However the if I `export REUSE_DB=1` and then run `bin/django test
> mentor_match -a'now' -s`, I see
> 2014-03-05T15:22:51.743338
> before test
> nosetests --verbosity 1 mc_allocate -s -anow
> 2014-03-05T15:23:15.813821
> test has begun
> which is still a difference of 24 seconds. Working on a linode, I often
> get notifications about the high disk i/o rate, so it makes sense that the
> cause is the latter. So, I turn to the sysadmin's microsope: strace.
>
> With REUSE_DB not set, I run `strace -f bin/django test mentor_match -s
> -a'now' 2> tracelog`. I can now see all of the system calls made by the
> command, 67626. `cat tracelog | grep "open(" | wc -l` shows that 37635 of
> those are open commands.
>
> total:   67626
> open:    37635  56.7%
> stat64:  15401  22.8%
> read:     5362   7.9%
> fstat64:  2619   3.9%
> write:    2316   3.4%
> poll:     2268   3.4%
> close:    1754   2.6%
> mmap2:     949   1.4%
> munmap:    847   1.3%
>
> Since there are far more open calls than there are read calls, and
> there are 45847 lines containing "ENOENT (No such file or directory)"
> (86.4% of the open or stat64 calls), I think that django spends a lot of
> time looking for files in the wrong location.
>
>
> I plan to write a blog post about this, but before I dig in, does anyone
> have any comments on the following approaches I'm considering:
> - reordering the paths assigned to sys.path
> - removing items from sys.path when I am running tests
> - caching the locations where modules were previously found and looking
> there first
>
> Has anyone looked at doing anything like this?
>

Not to my knowledge, but based on your data, it certainly looks like it
would be worthwhile doing so.

I have to say that the data you've presented here surprises me. I can't
think of any obvious reasons why Django would be spending that much time
trying to open non-existent files - at least, not to the extent that you
would observe 24 seconds of startup time - unless your disk performance was
*really* bad. However, I know from experience isn't generally a problem on
Linode.

The only "speculative opening" behaviour I can think of would be the
attempted imports that are performed as part of identifying INSTALLED_APPS
and management commands - that is, looking for each models.py in your
INSTALLED_APPS, and looking for the management/commands folder in each
installed app. However, this should be a fairly constrained activity;
startup will only look in each app once, for a single folder that (in most
cases) won't exist. We're talking O(100) operations in the worst case for a
Django project - not the sort of thing I would have expected to cause 24
seconds of slowdown.

So - I'm very much interested in what you discover. Good hunting! (I
understand your weapon of choice is a musket, so you'll need the luck :-)

Also - you might want to bump this thread over to django-developers; you're
more likely to find a receptive audience for this sort of analysis.

Yours,
Russ Magee %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAJxq84-bk5c6b01nK8%2BmCfNPLw2ZYVX0y%2BSo6svPHcwrWfxKEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to