I think I can test the cron hypothesis. I'm running web2py under winpdb. Normally, I set the fork mode to "parent, auto" so that it doesn't break when a child process starts. If I change it to manual, it should break next time a child process is spawned. What should I look for when that happens?
On Tue, Jul 27, 2010 at 8:49 AM, Michael Ellis <michael.f.el...@gmail.com>wrote: > Not using cron. > > > > > On Tue, Jul 27, 2010 at 8:43 AM, mdipierro <mdipie...@cs.depaul.edu>wrote: > >> Ignore my previous email.... I see you are using 2.6 and the problem >> is with cron. I think the problem is a cron process that does not end >> and keeps restarting. Are you using cron? >> >> On Jul 27, 7:33 am, Michael Ellis <michael.f.el...@gmail.com> wrote: >> > Not sure if this is related to Rocket or whether a new topic is needed. >> > >> > This morning I found several OSError reports about "Too many open files" >> in >> > a web2py development server that's been running locally for several >> days. >> > The app code isn't doing any explicit file i/o so I don't know what's >> going >> > on, but here are the tracebacks in case someone else is seeing anything >> > similar. >> > >> > Mike >> > >> > Exception in thread Thread-89: >> > Traceback (most recent call last): >> > File >> > >> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", >> > line 522, in __bootstrap_inner >> > self.run() >> > File "/Users/mellis/w2ptip/gluon/newcron.py", line 206, in run >> > shell=self.shell) >> > File >> > >> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", >> > line 595, in __init__ >> > errread, errwrite) >> > File >> > >> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py", >> > line 1018, in _execute_child >> > errpipe_read, errpipe_write = os.pipe() >> > OSError: [Errno 24] Too many open files >> > >> > ERROR:Rocket.Errors.ThreadPool:Traceback (most recent call last): >> > File "/Users/mellis/w2ptip/gluon/rocket.py", line 302, in start >> > sock = l.accept() >> > File >> > >> "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/socket.py", >> > line 195, in accept >> > sock, addr = self._sock.accept() >> > error: [Errno 24] Too many open files >> > >> > ERROR:root:corrupted file: >> > /Users/mellis/w2ptip/applications/init/cache/cache.shelve >> > ERROR:root:Traceback (most recent call last): >> > File "/Users/mellis/w2ptip/gluon/restricted.py", line 178, in >> restricted >> > exec ccode in environment >> > File "/Users/mellis/w2ptip/applications/init/models/db.py", line 35, >> in >> > <module> >> > auth.define_tables() # creates all needed >> tables >> > File "/Users/mellis/w2ptip/gluon/tools.py", line 1106, in >> define_tables >> > format='%(first_name)s %(last_name)s (%(id)s)') >> > File "/Users/mellis/w2ptip/gluon/sql.py", line 1309, in define_table >> > File "/Users/mellis/w2ptip/gluon/sql.py", line 1715, in _create >> > IOError: [Errno 24] Too many open files: >> > '/Users/mellis/w2ptip/applications/init/databases/sql.log' >> > >> > On Tue, Jul 27, 2010 at 2:54 AM, Rahul <rahul.dhak...@gmail.com> wrote: >> > > Thanks everyone, for getting this issue resolved... >> > > Web2py rocks! >> > > Cheers, Rahul >> > >> > > On Jul 24, 6:25 pm, Phyo Arkar <phyo.arkarl...@gmail.com> wrote: >> > > > Yes , Since from start of web2py , Massimo and Us never recommend to >> use >> > > > built-in web2py server at production. They are mainly for >> development >> > > > purpose. >> > > > Since Very start of the project , since Cherrypy , Massimo always >> Suggest >> > > to >> > > > use Apache/Cherokee/LighHTTPD over fcgi/wsgi/uwsgi or mod_python in >> > > Serious >> > > > Production Server. >> > >> > > > Rocket tho tries quite hard to achieve production level Performance >> with >> > > all >> > > > Cool Comet/Threading Stuff . It still Quite Young .Lets give it a >> chance. >> > >> > > > On Sat, Jul 24, 2010 at 7:39 PM, Scott <blueseas...@gmail.com> >> wrote: >> > > > > Please allow me to preface my comments: I have nothing against >> Rocket; >> > > > > my opinions come from years of experience with Java EE >> deployments. >> > >> > > > > I think raising the max_threads to 1024 is a good idea. However, >> my >> > > > > opinion is that Rocket alone should not be used for a production >> > > > > deployment; much as I would not use the built-in Web server in >> JBoss, >> > > > > WebLogic, Geronimo, etc. as the front door. My suggestion for >> > > > > production would be to use an Apache front-end into Rocket. >> Apache is >> > > > > more battle-hardened in this area, and it's a lot easier to handle >> DoS >> > > > > attacks through modules such as mod_evasive. There are numerous >> other >> > > > > benefits too, such as easily enabling gzip compression and >> allowing >> > > > > you a better security model through Defense in Depth... but I >> digress. >> > >> > > > > On Jul 23, 5:41 pm, mdipierro <mdipie...@cs.depaul.edu> wrote: >> > > > > > On a second thought this open the door to more sever denial of >> > > service >> > > > > > attacks than caused by the original problem. How about, until >> there >> > > is >> > > > > > a better under understanding and solution, we just increase >> > > > > > max_threads from the original 128 to 1024. >> > >> > > > > > On Jul 22, 11:27 am, Timbo <tfarr...@owassobible.org> wrote: >> > >> > > > > > > Try one quick change for me please...rocketis constructed >> around >> > > line >> > > > > > > 655 in main.py >> > >> > > > > > > Add a parameter to the constructor call(s): max_threads=0 >> > >> > > > > > > Please let me know if that affects the problem. >> > >> > > > > > > -tim >> > >> > > > > > > On Jul 22, 10:34 am, mdipierro <mdipie...@cs.depaul.edu> >> wrote: >> > >> > > > > > > > I can reproduce the problem. I did on localhost with two >> > > different >> > > > > > > > browsers. >> > > > > > > > Using firebug I can see it takes 25seconds to download >> base.css >> > > (the >> > > > > > > > problem is not always with the same file). >> > > > > > > > While I did the test, I also monitored httpserver.log and I >> find >> > > that >> > > > > > > > it NEVER takes more than 1.2ms serve base.css. >> > > > > > > > This is what the log shows: >> > >> > > > > > > > .... >> > > > > > > > 127.0.0.1, 2010-07-22 10:16:38, GET, >> > > /michealellistest/static/images/ >> > > > > > > > header.png, HTTP/1.1, 304, 0.000563 >> > > > > > > > 127.0.0.1, 2010-07-22 10:16:38, GET, /favicon.ico, HTTP/1.1, >> 400, >> > > > > > > > 0.000631 >> > > > > > > > 127.0.0.1, 2010-07-22 10:16:55, GET, >> /michealellistest/static/ >> > > > > > > > base.css, HTTP/1.1, 304, 0.000791 #### locks firefox for >> 25secs >> > > > > > > > .... >> > > > > > > > 127.0.0.1, 2010-07-22 10:22:42, GET, >> /michealellistest/static/ >> > > > > > > > jquery.timers-1.2.js, HTTP/1.1, 304, 0.000552 >> > > > > > > > 127.0.0.1, 2010-07-22 10:22:42, GET, /favicon.ico, HTTP/1.1, >> 400, >> > > > > > > > 0.000497 >> > > > > > > > 127.0.0.1, 2010-07-22 10:23:02, GET, >> /michealellistest/static/ >> > > > > > > > superfish.js, HTTP/1.1, 304, 0.000914 #### locks chrome >> for >> > > 25secs >> > >> > > > > > > > Do you see the time gaps? >> > >> > > > > > > > There is a clear pattern. Under heavy load a request that >> results >> > > in >> > > > > a >> > > > > > > > HTTP 400 error locksRocket. >> > >> > > > > > > > Notice that the logging is done by a wsgi application that >> calls >> > > > > > > > web2py wsgibase, i.e it time how long it takes web2py to >> receive >> > > the >> > > > > > > > request and send the response. The extra time must be spent >> > > inside >> > > > > the >> > > > > > > > web server. >> > >> > > > > > > > It is also important that the times showed in the logs are >> the >> > > actual >> > > > > > > > time when the data is being written in the logs. You can see >> > > firefox >> > > > > > > > waiting for base.css, the server waiting to log base.css and >> > > nothing >> > > > > > > > else is being printed during the wait, signifying that >> web2py is >> > > not >> > > > > > > > running any request. >> > >> > > > > > > > We need Tim! This is a problem. >> > >> > > > > > > > Massimo >> > >> > > > > > > > On Jul 22, 9:22 am, Michael Ellis < >> michael.f.el...@gmail.com> >> > > wrote: >> > >> > > > > > > > > I've isolated the problem but absolutely do not understand >> it. >> > > I >> > > > > can >> > > > > > > > > reproduce it with a two-line change to web2py_ajax.html. >> Will >> > > > > someone with >> > > > > > > > > the time and equipment please attempt to replicate this >> as a >> > > > > sanity check? >> > >> > > > > > > > > Here's how: >> > >> > > > > > > > > In the welcome app's web2py_ajax.html, insert the >> following >> > > after >> > > > > line 3. >> > >> > > >> response.files.insert(3,URL(r=request,c='static',f='jquery.sparkline.js')) >> > >> > > >> response.files.insert(4,URL(r=request,c='static',f='jquery.timers-1.2.js')) >> > >> > > > > > > > > Copy the attached js files into welcome/static. They >> should be >> > > the >> > > > > same as >> > > > > > > > > the versions available online. >> > >> > > > > > > > > To reproduce the problem, serve web2py on your LAN. Open >> the >> > > > > welcome home >> > > > > > > > > page on two different machines. One of them can be on the >> > > server. >> > > > > Briskly >> > > > > > > > > reload the page 10 or more times on either machine then >> try to >> > > > > reload on the >> > > > > > > > > other. In my setup, the delay is reliably 25 seconds from >> the >> > > time >> > > > > I make >> > > > > > > > > the last click on the first machine. >> > >> > > > > > > > > I'm able to reproduce this in FF, Chrome, and Safari using >> the >> > > > > latest web2py >> > > > > > > > > from trunk. Haven't tried any other browsers yet. As >> noted >> > > > > previously both >> > > > > > > > > machines are MacBooks running Snow Leopard. >> > >> > > > > > > > > Mike >> > >> > > > > > > > > jquery.timers-1.2.js >> > > > > > > > > 4KViewDownload >> > >> > > > > > > > > jquery.sparkline.js >> > > > > > > > > 62KViewDownload >> > >