Hello, In short: First signal delivery on call of pthread_kill(tid, SIGUSR1) works fine but second one delivers signal with _delay_ and _twice_, and then my own daemon crash. This is on cygwin1.dll 1.5.10-3 and snapshot of 2004-08-21. But on cygwin1.dll 1.5.7-1 all works _fine_ and _long_ time.
In detail: (Sorry for my bad English.) (And sorry for long letter.) (And much more sorry if this is not appropriate place to ask this question.) I'm writing a daemon. Multi-threaded and TCP/IP-oriented. And it works. On Linux and FreeBSD. And on cygwin 1.5.7-1. But there is a problem which I'm currently trying to solve: one of the servicing client thread hangs up with 100% CPU usage in random time periods (but with probability nearly equal to 1) and do not response to wakeup signals. The first investigation brought me an idea that the problem is lying in select() call in cygwin implementation. (I have only one socket in select() call and nothing more.) But maybe I'm wrong. Anyway, in that moment I made the decision to update cygwin1.dll because my cygwin1.dll 1.5.7-1 is too old and maybe a newer version solves my problem. I got 1.5.10-3 and... my daemon crashed :( on very simple action. I had never seen something like this with 1.5.7-1. Then I got the (latest on that moment) snapshot of cygwin1.dll - 2004-08-21. And my daemon crashed again with same symptoms. gdb 2003-09-20-cvs (cygwin-special) didn't help because it received SIGSEGV in KERNEL32!IsBadWritePtr on my daemon run before the piece of code with problem was executed. But strace (cygwin) 1.21 gave some interesting results. Here is some data and logs. $ strace -o strace7.log -t -m sigp yolopd -l yolopd7.log $ strace -o straceB.log -t -m sigp yolopd -l yolopdB.log yolopd7.log (1.5.7-1) and yolopdB.log (snapshot 20040821) are my daemon logs. Just in case. SIGUSR1 is used by main thread to awake other sleeping threads which perform some actions. The SIGUSR1 handler is very simple: static void handler_sigusr1(int sig) { #ifdef DEBUG log_msg("got SIGUSR1"); #endif // do nothing, only for exit from sleep() } $ grep -i sig yolopd7.log 2004-08-27 13:03:41 yolopd[195]: setting signal handlers done. 2004-08-27 13:03:53 yolopd[195]: ATM001: send SIGUSR1 (refresh). 2004-08-27 13:03:53 yolopd[195]: got SIGUSR1. 2004-08-27 13:04:24 yolopd[195]: ATM001: send SIGUSR1 (refresh). 2004-08-27 13:04:24 yolopd[195]: got SIGUSR1. [...and then normal exit...] $ grep -i sig yolopdB.log 2004-08-27 13:01:35 yolopd[149]: setting signal handlers done. 2004-08-27 13:01:50 yolopd[149]: ATM001: send SIGUSR1 (refresh). 2004-08-27 13:01:50 yolopd[149]: got SIGUSR1. 2004-08-27 13:01:59 yolopd[149]: ATM001: send SIGUSR1 (refresh). 2004-08-27 13:02:20 yolopd[149]: got SIGUSR1. 2004-08-27 13:02:20 yolopd[149]: got SIGUSR1. <-- twice! [...immediate crash...] $ ls -l strace?.log -rw-r--r-- 1 Valery None 6277 Aug 27 13:04 strace7.log -rw-r--r-- 1 Valery None 171388 Aug 27 13:02 straceB.log Compare the size of strace log files! >:-[ ] $ grep -c pending strace?.log strace7.log:5 straceB.log:679 <-- !!! $ grep -c "signal.*not delivered" strace?.log strace7.log:0 straceB.log:6 <-- !!! $ grep -E '(arriv|returning.*signal 30)' strace7.log 13:03:53 [sig] yolopd 195 _threadinfo::interrupt_setup: armed signal_arrived 0x60, sig 30, res 1 13:03:53 [main] yolopd 195 reset_signal_arrived: reset signal_arrived 13:03:53 [unknown (0x6A)] yolopd 195 sig_send: returning 0x0 from sending signal 30 13:04:24 [sig] yolopd 195 _threadinfo::interrupt_setup: armed signal_arrived 0x60, sig 30, res 1 13:04:24 [main] yolopd 195 reset_signal_arrived: reset signal_arrived 13:04:24 [unknown (0x6A)] yolopd 195 sig_send: returning 0x0 from sending signal 30 $ grep -E '(arriv|returning.*signal 30)' straceB.log 13:01:50 [sig] yolopd 149 _cygtls::interrupt_setup: armed signal_arrived 0x9C, sig 30, res 1 13:01:50 [unknown (0xC3)] yolopd 149 sig_send: returning 0x0 from sending signal 30 13:01:51 [unknown (0xC8)] yolopd 149 reset_signal_arrived: reset signal_arrived 13:02:00 [unknown (0xC3)] yolopd 149 sig_send: returning 0x0 from sending signal 30 13:02:20 [unknown (0xC8)] yolopd 149 reset_signal_arrived: reset signal_arrived 13:02:20 [sig] yolopd 149 _cygtls::interrupt_setup: armed signal_arrived 0x9C, sig 30, res 1 13:02:20 [unknown (0xC8)] yolopd 149 reset_signal_arrived: reset signal_arrived Two things which I've found out in strace?.log. First. In strace7.log exactly one "reset_signal_arrived" record after each armed signal. In straceB.log for first signal there is _two_ "reset_signal_arrived" records with ~30 seconds interval. Second. In strace7.log "reset_signal_arrived" record is present _before_ "returning 0x0 from sending signal 30". In straceB.log - _after_. Maybe I'm saying nonsense and problem is placed in my daemon code but not in cygwin but... Whole trace files (strace7.log & straceB.log) are attached to this letter. I can't make small source code to reproduce the problem. The conditions of run of my daemon is very specific. I played myself with this problem about a week. And I no any idea to make something more. Please, any suggestion? (for signals delivering; with hang of threads I'll play myself later) gcc 2.95.3-10 (cygwin special) <-- not 3.x Windows NT 4.0 Workstation SP6a+ AMD Athlon 800 MHz, 512 MB RAM WBR, Valery -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/