On Sat, Jan 07, 2023 at 12:59:26AM +0100, Peter Boy wrote:
> 
> > Am 06.01.2023 um 18:06 schrieb Michael Catanzaro <mcatanz...@redhat.com>:
> > 
> > ...
> > 
> > I think most of the feedback on this change can be summarized as:
> > 
> > (a) Specific services want longer timeouts.
> > 
> > This can already be configured via existing configuration mechanisms, so I 
> > think it's safe enough to ignore this problem. E.g. if a quick shutdown 
> > will brick your Pinephone modem or corrupt your database, then whatever 
> > service is involved there should request a larger timeout.
> 
> As several posts have shown, it is specifically not safe to ignore the 
> problem. It is a mystery to me how you can come to this assessment. 
> 
> We don't know if all affected services explicitly request a longer timeout. 
> We don't have a test procedure nor a QA criterion for this that is testable. 
> We don't know how many rely on the current default timeout because it has 
> worked so far. And in view of these known circumstances to introduce a "quick 
> shutdown" so nonchalantly and without exact data and tests is simply 
> irresponsible and endangers the good reputation of the distribution and 
> especially Fedora Server known to run reliably stable with (or in spite of) a 
> quick release sequence. 
> 
> And it does not take into account in any way the other fact, expressed here 
> in several posts, that it is not a problem of individual, singular processes, 
> but the interaction of several processes in the specific shutdown situation, 
> whereby individual processes can not terminate themselves as quickly as they 
> do in normal circumstances. And it's obviously a non-determinant random 
> process that turns out differently for each shutdown.
> 
> The current timeout may not be perfect, but long experience shows that in the 
> vast majority of cases the value results in a safe, uncorrupted shutdown.  We 
> do not have a wave of complaints about system corruption after shutdown.
> 
> And the current value may be the result of a wild guess. I do not know how it 
> was achieved. But replacing one wild guess with another wild guess that 
> introduces additional, unpredictable risks is not a sound and robust approach 
> (and that is true not only for server, by the way).

The current default is mostly arbitrary. It was just selected as a nice round
value, in the spirit of "let's pick something large enough to be larger than any
realistic process will ever need".

I think you're misinterpreting Michael's words that "it's safe enough to ignore 
this problem".
IIUC, the idea is to set a longer timeout in those cases at the service level.
I.e. the problem is "ignored" only in the sense of the system-wide default being
smaller, and the specific services setting a higher timeout as required.

Also, even with the current high defaults, some services still actually time 
out.
If something bad happens in that case, it is already happening. This is bad
for users in at least two ways. First, because they have to wait and wait, and
second because the timeout is actually hit so things *do* get terminated but 
when
this happens, we do nothing. The idea would be to lower the default timeouts,
but also approach any cases where we hit the timeout much more seriously.

Zbyszek
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to