I'm naturally curious to know the culprit in your env, but unfortunately I 
can't reproduce in any of my systems (*buntu 14.04 + postgresql/sqlite or 
Win2012r2  + postgresql/sqlite/mssql).
Needs to be said that I mostly hate mysql (and would never trust it nor 
recommend to the worst of my enemies) but its totally a personal bias. I 
know several peoples using mysql and never reporting issues whatsoever.

Dunno if I'm the "most powered user" of the scheduler around but I range 
from 4 to 20 workers (even 50, but "cheating" with the redis-backed 
version) and these kind of issues never happened, unless a reaaaally long 
task with a reeeeaally long stdout output (which, once known, it's easy to 
suppress, and btw the problem is less of a matter since a few releases 
ago). 
"My" worker processes are usually alive for more than a week (i.e. same 
pid), so it's not a problem of phantom processes or leaks.

Although pretty heavy (resources-wise) if compared to pyres, celery, huey, 
rq (just to name the "famous ones") , web2py's scheduler, which HAS to work 
in all supported OS by web2py is rather simple: each worker spawns a single 
process with a queue 1 element long to communicate over ... that process 
handles a single task, and then dies. Everything is thrashed and recreated 
at the next task pick-up.



On Wednesday, November 2, 2016 at 5:47:42 PM UTC+1, Erwn Ltmann wrote:
>
> Hi Niphlod,
>
> your replies are always a pleasure to me. :)
>
> On Wednesday, November 2, 2016 at 12:00:48 PM UTC+1, Niphlod wrote:
>>
>> I'd say there are a LOT of strange things going on on your system, since 
>> you're reporting several different issues that nobody ever faced and all in 
>> the last week.
>>
>
> Concerning deadlocks and zombies - right? Both issues are faced within the 
> using the scheduler, not web2py in general. And only in cases I start more 
> than one worker.
>
> zombie processes shouldn't be there unless you killed improperly a worker 
>> process. 
>> Python can't really do anything about it, and that's the way there's a 
>> specific API to kill (or terminate) a worker.
>>
>
> Your right, the killer is the scheduler himself. Why? The scheduler 
> terminates a task after passing the timeout. The timeout happened because 
> the task never does that as it is defined. In cases of zombie situation the 
> sub process is stopping with sem_wait() function (pstack). I don't know 
> way. But, it's happened before the function 'executor' entered, because of 
> no debug line printing at the entry point of that function.
>
> Ok. That's all what I know. I have different RHEL systems (RH6,RH5) with 
> python 2.7.12 and MariaDB. Not realy exotic.
>
> Thank you for your endurance
> Erwn
>
>  
>
>>
>> On Wednesday, November 2, 2016 at 10:53:58 AM UTC+1, Erwn Ltmann wrote:
>>>
>>> Dear all,
>>>
>>> I'm astonished about a lot of processes as sub process of scheduler 
>>> worker are not finished.
>>>
>>> pstree -p 16731
>>>  
>>>>
>>>> bash(16731)---python2.7(24545)-+-python2.7(24564)---{python2.7}(24565)
>>>>                                |-python2.7(24572)-+-python2.7(1110)
>>>>                                |                  |-python2.7(8647)
>>>>                                |                  |-python2.7(11747)
>>>>                                |                  |-python2.7(14117)
>>>>                                |                  |-python2.7(14302)
>>>>
>>>
>>> The 16731 is my shell I started the scheduler with four worker:
>>>
>>> w2p -K arm:ticker,arm,arm,arm
>>>>
>>>
>>> The pid 24564 is the ticker worker (only hold the ticker) and 24572 one 
>>> of three standard worker which has to process my task's function.
>>>
>>> My first focus was on the function itself. But, if I clip the function 
>>> ('return True' at start point) the zombies were already there. My next 
>>> analyze step was to show the pid at the start point of 'executor' function 
>>> of scheduler.py. In case of zombie processes I never reach this debug 
>>> point. Next I printed out the list of zombie processes 
>>> (multiprocessing.active_children()) at the exit point of tasks which passed 
>>> the timeout (see function async). It's the point in the scheduler code 
>>> where 'task timeout' is printing out. The timeout is clear because of a 
>>> process which never returns a result. But, how is it possible?
>>>
>>> Here's the list of my extra debug line in function async's timeout part:
>>>
>>> 09:09:47.752 [24576] Process-4:488,
>>>> 09:14:28.907 [24576] Process-4:488, Process-4:1125,
>>>> 09:15:59.526 [24576] Process-4:488, Process-4:1125, Process-4:1301,
>>>> 09:20:35.924 [24576] Process-4:488, Process-4:1880, Process-4:1125, 
>>>> Process-4:1301,
>>>>
>>>
>>> Why did the 'executor' function never process the code?
>>>
>>> def async(self, task): 
>>>
>>>   ... 
>>>
>>>   out = multiprocessing.Queue()
>>>>   queue = multiprocessing.Queue(maxsize=1)
>>>>   p = multiprocessing.Process(target=executor, args=(queue, task, out))
>>>
>>>   ...
>>>>   if p.is_alive():
>>>>        p.terminate()
>>>>        logger.debug('      +- Zombie (%s)' % 
>>>> multiprocessing.active_children()) 
>>>>
>>>
>>> And here the extra line in executor:
>>>
>>> def executor(queue, task, out):
>>>>   """The function used to execute tasks in the background process."""
>>>>   logger.debug('    task started PID:%s -> %s' % 
>>>> (os.getppid(),os.getpid()))
>>>
>>>   ... 
>>>>
>>>
>>> Of course, I have to stress the scheduler to become zombies. The rate is 
>>> 1 of 1000. In my case 25 times each hour!
>>>
>>> Can any body clarify this? May it's concerning pure python.
>>>
>>> Thx,
>>> Erwn
>>>
>>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to