Hi Sara, thank you very much for this detailed explanation! I am still thinking of what is the best approach for our application, that is simple and reliable enough.
Nevertheless, I see that there is a general interest in this feature. May I propose again to add this in the kernel itself? So the scheduler can directly monitor the various tasks? Or is there any good reason not to? Στις Πέμ, 18 Μαρ 2021 στις 2:38 μ.μ., ο/η Sara da Cunha Monteiro de Souza < saramonteirosouz...@gmail.com> έγραψε: > Hi Fotis, > > I think I was very brief on my answer. > So I decided to draw this draft to detail the watcher/watched example > and then you decide if it is suitable for you or not. > Note: Unfortunately, I am not an expert on signaling nor > scheduling to foresee all possible scenarios here and answer your > questions. > > In the below draft, you can see that we have one thread on one side, the > watcher. > And 4 threads on the other side, the watched tasks. > Any task who wants to be watched, can get the pid and the signal number of > the watcher > through a file created upon a mounted pseudo filesystem. > After getting it, the task may send a signal with an argument that > represents a request > code for one of the following actions: > - Subscribe to be watched. > - Unsubscribe to be watched. > - Feed the dog. > From the moment a task is subscribe, it needs to periodically send a > signal with > the feed the dog request to let the watcher know it is alive. > Once it is unsubscribed, the watcher no longer takes care of it. > > On the watcher side, it has a signal handler to receive this signal and > perform > the actions like: subscribe one more task in its internal list. > And it previous configure a hardware watchdog timer to trigger an interrupt > handler at timeout. > This interrupt handler signals *another* signal handler that will get all > the subscribed > tasks, see which of them sent a signal to feed the dog in the last "cyle" > and it will > print all the offending tasks. > > > Em qua., 17 de mar. de 2021 às 20:31, Gregory Nutt <spudan...@gmail.com> > escreveu: > >> >> > I would like to ask, are there cases where the timer may not fire? >> > Is it guaranteed to fire, for example, if the thread is in a dead lock, >> >> The signal will be delivered to some thread in the task group, so in >> that sense it will "fire". However, it may be the case that the the >> dead locked thread cannot respond to the signal. There are many >> possible scenarios. >> >> If you setup a signal handler to run and the task is deadlocked waiting >> on a semaphore that receives the signal, the semaphore wait will return >> EINTR and the signal handler will run. But, I can imagine other cases >> where the task might not respond to the signal. >> >> In a multi-threaded task group, it may may not be the waiting thread >> that receives the signal, however. Multi-threaded signal delivery is >> complex and non-obvious. >> >> > or if a higher priority thread has caused CPU starvation, or similar >> cases? >> >> Certainly the signal handling will typically pend and be delayed. POSIX >> requires that signal events be queued only at least one deep; NuttX >> queues signal action events but not other signal events (as I recall). >> The behavior will depend on the sigprocmask and might not pend at all! >> >> Signal handling is complex and this was written from my recollection >> which may be flawed. >> >>