First of all: I decided to use web2py for my purposes becase it is awesome ;) I believe it is not a web2py's bug or anything like related thing. It can be more OS and systemd related issue.
Let me explain what I do and what is the environment. I work in a lab where we try to automate many tests on physical devices (like STBs and phones). I have a single source for master (ubuntu server) and slave servers (ubuntu server/desktop). Master is configured with uwsgi+nginx+mysql+web2py services. Then I do have slaves that use the same source, but can spawn tests within scheduler processes. I need to connect many physical devices to the slaves (climate chambers, arduino for IR control, v4l2 capture cards, ethernet controled power sources, power supply instruments, measurement instruments... bla bla bla). I decided to make a GUI using qooxdoo where user can write a python code that allocates physical devices and run specific test scenarios to examine DUT (Device Under Test) condition. These tests sometimes need to be run for tens of hours. So the workflow can be described as: - user writes a script - the test is enqueued as a task in db (JobGraph does a perfect work for me because I need to control the execution sequence mainly because of the existence of physical devices like climate chambers and etc; allocated lab instrument cannot be used by two tests at the same time, jobgraph can yield it) - every slave has it's unique group-name - DUTs and lab instruments are bound to the specific slave - scheduler group-name - slave executes the test scenario programmed by user - test is nothing more than overriden TestUnit - every LAB instrument has child process which logs parameters (temperature, humidity, voltage bla bla bla) - for DUT is also created instance of a class that spawns child processes (video freeze detection based on gstreamer, udp/tcp/telnet interface to interract with STB) - in test scenario I have plenty of sleeps - test scenario demands for example that STB stays in a cimate chamber for 20h in specific temp and humidity My systemd service file looks like this: [Unit] Description=ATMS workers After=network-online.target Wants=network-online.target [Service] User=<USER> Restart=on-failure RestartSec=120 Environment=DISPLAY=:<DISPLAY_NB> # usually 0 Environment=XAUTHORITY=/home/<USER>/.Xauthority EnvironmentFile={{INSTALL}}/web2py_venv/web2py/applications/atms/private/ atms.env ExecStartPre=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R ${WEB2PYDIR}/applications/atms/systemd/on_start.py -P" ExecStart=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -K atms:%H,atms:%H" ExecStop=/bin/sh -c "${WEB2PYPY} ${WEB2PY} -S atms -M -R ${WEB2PYDIR}/applications/atms/systemd/on_stop.py -P" [Install] # graphical because i had to make some kind of preview with ximagesink for fast lookup if video is ok on STB WantedBy=graphical.target Alias=atms.service I realised that for very long test (last one was planned to be longer than 100h) i got sth like this in logs: gru 11 12:01:52 slaveX sh[2184]: File "/atms/web2py_venv/web2py/gluon/packages/dal/pydal/adapters/base.py", line 1435, in gru 11 12:01:52 slaveX sh[2184]: return str(long(obj)) gru 11 12:01:52 slaveX sh[2184]: File "/atms/web2py_venv/web2py/gluon/packages/dal/pydal/objects.py", line 82, in <lambda gru 11 12:01:52 slaveX sh[2184]: __long__ = lambda self: long(self.get( 'id')) gru 11 12:01:52 slaveX sh[2184]: TypeError: long() argument must be a string or a number, not 'NoneType' The test was stopped 20h before it was supposed to be finished :/ After some digging I found that before these errors i got this one: gru 11 12:01:34 slaveX sh[2184]: ERROR:web2py.app.atms:[(</tmp/ taskId10672_caseId852_duts32/test_script.py.TestCase testMethod=test_example >, 'Traceback (most recent call last):\n File "/tmp/taskId10672_caseId852_duts32/test_script.py", line 90, in test_example\n sleep(M10)\n File "/atms/web2py_venv/web2py/gluon/scheduler.py", line 702, in <lambda>\n signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))\nSystemExit: 1\n')] gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms: new task report: FAILED gru 11 12:01:34 slaveX sh[2184]: DEBUG:web2py.app.atms: traceback: Traceback (most recent call last): .. and many many many tracebacks with errors after that Line 702 in scheduler.py is: signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1)) ....in scheduler's loop function. What does it mean? The process was stopped because kernel/systemd sth else decided to do so?? Long sleep calls can have sth in common? Did anyone encountered similar problems? Do you have any idea how to prevent against such behavior? Thank you in advance for any response :) -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.