Hello,

On Fri 10 Oct 2025 at 02:41pm +01, Ian Jackson wrote:

> 1. We already have a calling-convention version in oracled'd
> invocation of d-r-s.  If we were to bump that when we make an
> incompatible change, then attempts by the old oracled to invoke d-r-s
> will immediately fail.
>
> The worker will crash and restart after the crashed worker delay, and
> then fail again.  So the service would be down, looping harmlessly,
> until oracled was restarted.

Good point.  If we are going to rely on this then we should write down
in a comment somewhere the need to bump that version to prevent this
sort of scenario.

> 2. We don't want to forcibly restart an oracled that has workers in
> the middle of jobs.  My priors say using systemctl to restart the
> daemon will kill all of its children (but maybe we have disabled that
> systemd feature?)

Yes, systemctl will eventually kill all the children, but only after a
timeout.  First it sends SIGTERM.  The Oracle propagates the SIGTERM to
its workers which will then finish up their current jobs and die.
If this takes longer than TimeoutStopSec in the unit file, which we set
to 2000, then they get a SIGKILL.

2000 seconds seems like long enough to consider doing a restart on
package upgrade, to me.

> 3. oracled could probably detect this situation somehow.  With C
> programs one can stat /proc/self/exe and compare the inum with stat of
> what one thinks one's own path is.  I experimented and empirically if
> you add a __DATA__ then perl keeps the script file open and you can
> get its inum by statting DATA:
>
> echo >t.pl 'sub p ($) { my ($w) = @_; my @s = stat $w; print "$w @s\n" };
> foreach my $x (</proc/self/fd/*>) { p $x; }; p "."; p "DATA";'; echo >>t.pl
> '__DATA__'; perl -M autodie -w t.pl; ll -i t.pl
> /proc/self/fd/0 24 71 8592 1 1000 5 34884 0 1760103640 1760103640 1760102927 
> 1024 0
> /proc/self/fd/1 24 71 8592 1 1000 5 34884 0 1760103640 1760103640 1760102927 
> 1024 0
> /proc/self/fd/2 24 71 8592 1 1000 5 34884 0 1760103640 1760103640 1760102927 
> 1024 0
> /proc/self/fd/3 15 17930 4480 1 1000 1000 0 0 1758008779 1758008779 
> 1758008779 4096 0
> /proc/self/fd/4 15 17930 4480 1 1000 1000 0 0 1758008779 1758008779 
> 1758008779 4096 0
> /proc/self/fd/5 64772 967692 33204 1 1000 1000 0 133 1760103645 1760103645 
> 1760103645 4096 8
> /proc/self/fd/6
> . 64772 950333 16893 14 1000 1000 0 4096 1760103347 1760103223 1760103223 
> 4096 8
> DATA
> 967692 -rw-rw-r-- 1 ian ian 133 Oct 10 14:40 t.pl

Ah, nice.  That would be viable.

-- 
Sean Whitton

Attachment: signature.asc
Description: PGP signature

Reply via email to