I'm naturally curious to know the culprit in your env, but unfortunately I can't reproduce in any of my systems (*buntu 14.04 + postgresql/sqlite or Win2012r2 + postgresql/sqlite/mssql). Needs to be said that I mostly hate mysql (and would never trust it nor recommend to the worst of my enemies) but its totally a personal bias. I know several peoples using mysql and never reporting issues whatsoever.
Dunno if I'm the "most powered user" of the scheduler around but I range from 4 to 20 workers (even 50, but "cheating" with the redis-backed version) and these kind of issues never happened, unless a reaaaally long task with a reeeeaally long stdout output (which, once known, it's easy to suppress, and btw the problem is less of a matter since a few releases ago). "My" worker processes are usually alive for more than a week (i.e. same pid), so it's not a problem of phantom processes or leaks. Although pretty heavy (resources-wise) if compared to pyres, celery, huey, rq (just to name the "famous ones") , web2py's scheduler, which HAS to work in all supported OS by web2py is rather simple: each worker spawns a single process with a queue 1 element long to communicate over ... that process handles a single task, and then dies. Everything is thrashed and recreated at the next task pick-up. On Wednesday, November 2, 2016 at 5:47:42 PM UTC+1, Erwn Ltmann wrote: > > Hi Niphlod, > > your replies are always a pleasure to me. :) > > On Wednesday, November 2, 2016 at 12:00:48 PM UTC+1, Niphlod wrote: >> >> I'd say there are a LOT of strange things going on on your system, since >> you're reporting several different issues that nobody ever faced and all in >> the last week. >> > > Concerning deadlocks and zombies - right? Both issues are faced within the > using the scheduler, not web2py in general. And only in cases I start more > than one worker. > > zombie processes shouldn't be there unless you killed improperly a worker >> process. >> Python can't really do anything about it, and that's the way there's a >> specific API to kill (or terminate) a worker. >> > > Your right, the killer is the scheduler himself. Why? The scheduler > terminates a task after passing the timeout. The timeout happened because > the task never does that as it is defined. In cases of zombie situation the > sub process is stopping with sem_wait() function (pstack). I don't know > way. But, it's happened before the function 'executor' entered, because of > no debug line printing at the entry point of that function. > > Ok. That's all what I know. I have different RHEL systems (RH6,RH5) with > python 2.7.12 and MariaDB. Not realy exotic. > > Thank you for your endurance > Erwn > > > >> >> On Wednesday, November 2, 2016 at 10:53:58 AM UTC+1, Erwn Ltmann wrote: >>> >>> Dear all, >>> >>> I'm astonished about a lot of processes as sub process of scheduler >>> worker are not finished. >>> >>> pstree -p 16731 >>> >>>> >>>> bash(16731)---python2.7(24545)-+-python2.7(24564)---{python2.7}(24565) >>>> |-python2.7(24572)-+-python2.7(1110) >>>> | |-python2.7(8647) >>>> | |-python2.7(11747) >>>> | |-python2.7(14117) >>>> | |-python2.7(14302) >>>> >>> >>> The 16731 is my shell I started the scheduler with four worker: >>> >>> w2p -K arm:ticker,arm,arm,arm >>>> >>> >>> The pid 24564 is the ticker worker (only hold the ticker) and 24572 one >>> of three standard worker which has to process my task's function. >>> >>> My first focus was on the function itself. But, if I clip the function >>> ('return True' at start point) the zombies were already there. My next >>> analyze step was to show the pid at the start point of 'executor' function >>> of scheduler.py. In case of zombie processes I never reach this debug >>> point. Next I printed out the list of zombie processes >>> (multiprocessing.active_children()) at the exit point of tasks which passed >>> the timeout (see function async). It's the point in the scheduler code >>> where 'task timeout' is printing out. The timeout is clear because of a >>> process which never returns a result. But, how is it possible? >>> >>> Here's the list of my extra debug line in function async's timeout part: >>> >>> 09:09:47.752 [24576] Process-4:488, >>>> 09:14:28.907 [24576] Process-4:488, Process-4:1125, >>>> 09:15:59.526 [24576] Process-4:488, Process-4:1125, Process-4:1301, >>>> 09:20:35.924 [24576] Process-4:488, Process-4:1880, Process-4:1125, >>>> Process-4:1301, >>>> >>> >>> Why did the 'executor' function never process the code? >>> >>> def async(self, task): >>> >>> ... >>> >>> out = multiprocessing.Queue() >>>> queue = multiprocessing.Queue(maxsize=1) >>>> p = multiprocessing.Process(target=executor, args=(queue, task, out)) >>> >>> ... >>>> if p.is_alive(): >>>> p.terminate() >>>> logger.debug(' +- Zombie (%s)' % >>>> multiprocessing.active_children()) >>>> >>> >>> And here the extra line in executor: >>> >>> def executor(queue, task, out): >>>> """The function used to execute tasks in the background process.""" >>>> logger.debug(' task started PID:%s -> %s' % >>>> (os.getppid(),os.getpid())) >>> >>> ... >>>> >>> >>> Of course, I have to stress the scheduler to become zombies. The rate is >>> 1 of 1000. In my case 25 times each hour! >>> >>> Can any body clarify this? May it's concerning pure python. >>> >>> Thx, >>> Erwn >>> >> -- Resources: - http://web2py.com - http://web2py.com/book (Documentation) - http://github.com/web2py/web2py (Source code) - https://code.google.com/p/web2py/issues/list (Report Issues) --- You received this message because you are subscribed to the Google Groups "web2py-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to web2py+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.