Or rather - how to keep the server from blowing up. I've searched the web before, but nothing I've found solves the problem.
Some background info - I'm running worldoflogs.com, a site that gets around 100 concurrent requests during rush hour and still growing rather rapidly. Django powers the frontend and custom java code does the number crunching on the backend. Running Django via mod_wsgi, multiprocess, apache-prefork. We ran into trouble this week when all WSGI workers were busy serving requests, apache queueing requests and page load times bouncing up and down from 0 to 20s. Easy fix: increase amount of processes, until you run out of ram. ps showed that each python process took at least 100M and half of them at 150M+, so getting past 60 processes was no go. I used Dozer to see if anything leaked between requests: nope. Okay... There are no objects alive, yet the memory usage rises after serving requests; running JMeter to fire off infinite requests to apache raised memory used from ~30M after 1 request to 200M+ after 250, stopped the benchmark after that. Maybe python doesn't free up memory from it's heap? I know that java with the default GC options does that. Bingo. The following 4 lines solved it for us: class GCMiddleware(object): def process_request(self, request): import gc gc.collect() Yeah. It was that simple. Memory usage went from insane to 50M and stable, even after thousands of requests to a single worker. If you're running out of ram but got CPU cycles to spare, do a full gc before every request. We had 85% idle time on the CPU, but RAM utilization was at 80% and I don't dare raising the limits further, running into swap kills the server instantly. It's silly how much attention GC gets on java and none at all on python, especially on a server memory tends to be a problem under load - if you go the multiprocess way instead of using threads. That's the main reason we use java on the backend - threads. This is not Django's fault, it's just python that tries to minimize GC time - what's good for one app is poison for another, and python's default GC behavior is quite evil in this case. This "solution" is quite crude, but tuning the garbage collector with set_threshold is an a pain in the backside; what I would like to see is a simple collector, like java's new generation: if full: collect; if free memory after collection <= min_free or >= max_free, resize heap to follow it. With threshold on the default 700/10/10, we run into minor collections all the time, promoting objects quickly from gen 0 to 1 to 2 and requiring a full collection to get it out of there. Trying to get the size right is impossible without an equivalent for -XX:+PrintGC -XX: +PrintGCDetails -XX:+PrintGCTimeStamps, things either get promoted too quickly or never, making any collection as expensive as a full GC. If someone has an idea how to get memory usage at about the same with lower cpu cost than a full GC every request, please tell. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---