On Apr 4 18:05, Vincent Dedun wrote:grepping cygserver debug output, show that, with 2 child process sharing mutex, wakeup is called first, then 2 msleep are called. So when msleep is called, wakeup has already been called, and msleep has to sleep forever.
What you see is intermixed debug output of different threads. The log output is not guranteed to be in the right order. I've improved debug output slightly so that it's at least possible to recognize mtx_locks and mtx_unlocks which are connected and who's the current owner of a mtx_lock.
I have debugged cygserver now for two days and have found various bugs, one showing up as soon as another one was fixed. I've rewritten the whole thread synchronization and I've even found a synchronization bug in the BSD code (which probably is no problem when running in the BSD kernel).
oh great, thank you for working on this.
I've checked in a pretty big patch which works fine for me (but what does that count?)
it works for me too for the testcase i provided last time.
But there is still some issues when you run several semaphore-using program at the same time.
to reproduce it : -compil last testcase with this modification :
-- if (n_children < 20) instead of -- if (n_children < 1) at the beginning of the main loop
and change -- usleep(10) to -- sleep(1) so childs does not disappear too quickly.
-compile it, and copy the binary in another place (like /tmp), so you can run two instance of it (as semaphore key is created from the path, must have different path).
-then run both of them, you should see --semaphore_lock: Identifier removed messages, and your semaphores aren't locked.
you can give -r 64 argument to cygserver to be sure that it is not a query thread number issue.
the message may disappear after a certain moment and your semaphore then works.
when you run again the testcase binary, cygserver may not respond anymore after that.
all this depending on how much query and cleaning thread you asked to cygwin
on the software i'm working one, semaphore not being locked perturb the execution, so i have to either disable the semaphores (as it is not locked with this error), or run slave and master on different computer (but most people would like to use the master as a slave too).
I tried to lower the listening children as much as possible, but the problem will always occure after some minutes of usage (I think when there is too much active children).
Thank you for your wonderful work.
Kraken
-- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/