Control: reassign -1 apt
Control: retitle -1 apt: daily systemd service handles dpkg frontend locking 
incorrectly?
Hi!

[ Please see the entire bug report for more details, otherwise I'm
  sure Vincent can fill in any missing pieces. ]

On Thu, 2024-07-25 at 11:03:50 +0200, Vincent Lefevre wrote:
> Control: reopen -1
> Control: severity -1 important
> 
> On 2024-07-11 00:49:20 +0200, Guillem Jover wrote:
> > libdpkg: Try to print the executable name of the lock contending process
> > 
> > Just printing the PID is not very useful to try to track down the
> > contending process as its presence might be momentary and might no
> > longer be present when the user tries to look for that specific PID.
> > 
> > Try to get the executable name to give a better hint to what might be
> > going wrong.
> > 
> > Closes: #1070027

I added this mostly as a debugging aid, even though I had already kind
of tracked the surrounding area for this, I guess I was expecting a
clone and reassign or a new bug report filed on apt, but probably I
should have done that myself, or not closed the report for the
improved debugging output (just adding a reference).

> Since the problem occurred again, this time on the dpkg upgrade itself:
> 
> Unpacking dpkg (1.22.9) over (1.22.7) ...
> Setting up dpkg (1.22.9) ...
> dpkg: error: dpkg frontend lock was locked by /usr/bin/apt-get process with 
> pid 326336
> 
> But this is rather useless information. Perhaps it should also
> write the full "ps -ef" output (something like that) to a file,
> though this may be too late.

Printing more stuff could certainly be helpful, but would be annoying.
I'd expect something like this bug not being common, so I'm not sure
(at least right now) it's worth it to implement further output, or
relying on ps or pstree or similar (which would seem rather meh).

> According to the journalctl output:
> 
> Jul 25 10:30:36 qaa systemd[1]: Reload requested from client PID 326242 
> ('systemctl') (unit session-2.scope)...
> Jul 25 10:30:36 qaa systemd[1]: Reloading...
> Jul 25 10:30:37 qaa systemd[1]: Reloading finished in 153 ms.
> Jul 25 10:30:37 qaa systemd[1]: Starting apt-daily.service - Daily apt 
> download activities...
> Jul 25 10:30:37 qaa systemd[1]: apt-daily.service: Deactivated successfully.
> Jul 25 10:30:37 qaa systemd[1]: Finished apt-daily.service - Daily apt 
> download activities.
> 
> but again, I did *not* do a systemctl. So either systemd is completely
> broken, or perhaps the systemctl was done by dpkg itself. Note that in
> this latter case (I would not be surprised, because when this happens,
> this is *always* during an upgrade), even if aptitude had some fix of
> frontend locking, there would still be a conflict between aptitude and
> dpkg, both leading to a request a lock.

A reload is the common operation expected when installing any systemd
service file, so if that would cause a dpkg lock issue, then that
would be a general problem. This is specific to the interaction of the
apt-daily.service and how it ends up locking things.

On Thu, 2024-07-25 at 13:27:52 +0200, Vincent Lefevre wrote:
> On 2024-07-25 11:49:42 +0200, Vincent Lefevre wrote:
> > The reload is triggered with just
> > 
> >   dpkg -i /var/cache/apt/archives/dpkg_1.22.9_amd64.deb
> > 
> > so aptitude and even apt aren't even involved in this bug:
> > 
> > Jul 25 11:47:56 qaa systemd[1]: Reload requested from client PID 333083 
> > ('systemctl') (unit session-2.scope)...
> > Jul 25 11:47:56 qaa systemd[1]: Reloading...
> > Jul 25 11:47:56 qaa systemd[1]: Reloading finished in 139 ms.
> 
> And if the timer time has been reached, the apt-daily service is run.
> 
> To try to reproduce the issue, I've copied
> 
>   /usr/lib/systemd/system/apt-daily.service
>   /usr/lib/systemd/system/apt-daily.timer
> 
> into /etc/systemd/system and edited them.
> 
> For /etc/systemd/system/apt-daily.service, I've changed the ExecStart
> script pathname and edited the copy of the apt.systemd.daily script to
> set VERBOSE=1 and ducplicate the following lines
> 
> # check if we can lock the cache and if the cache is clean
> if command -v apt-get >/dev/null && ! eval apt-get check $XAPTOPT $XSTDERR ; 
> then
>     debug_echo "error encountered in cron job with \"apt-get check\"."
>     exit 0
> fi
> 
> 6 times (as "apt-get check" attempts to lock the cache, thus
> giving more chance of failure).
> 
> For /etc/systemd/system/apt-daily.timer, I've changed the following
> lines to get
> 
> OnCalendar=*-*-* *:*:00
> RandomizedDelaySec=1m
> 
> thus reaching the timer time more often.
> 
> Now, whether I use
> 
>   aptitude reinstall dpkg dpkg-dev
> or
>   apt install --reinstall dpkg dpkg-dev
> or
>   dpkg -i /var/cache/apt/archives/dpkg_1.22.9_amd64.deb \
>           /var/cache/apt/archives/dpkg-dev_1.22.9_all.deb
> 
> I can get a failure in the apt.systemd.daily script:
> 
> Jul 25 12:59:17 qaa systemd[1]: Starting apt-daily.service - Daily apt 
> download activities...
> Jul 25 12:59:17 qaa apt.systemd.daily[370789]: error encountered in cron job 
> with "apt-get check".
> Jul 25 12:59:17 qaa systemd[1]: apt-daily.service: Deactivated successfully.
> Jul 25 12:59:17 qaa systemd[1]: Finished apt-daily.service - Daily apt 
> download activities.
> 
> With VERBOSE=2, I could get more details about this failure:
> 
> Jul 25 13:22:16 qaa systemd[1]: Starting apt-daily.service - Daily apt 
> download activities...
> Jul 25 13:22:16 qaa apt.systemd.daily[378644]: E: Could not get lock 
> /var/lib/dpkg/lock-frontend. It is held by process 378454 (apt)
> Jul 25 13:22:16 qaa apt.systemd.daily[378644]: E: Unable to acquire the dpkg 
> frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
> Jul 25 13:22:16 qaa apt.systemd.daily[378637]: error encountered in cron job 
> with "apt-get check".
> Jul 25 13:22:16 qaa systemd[1]: apt-daily.service: Deactivated successfully.
> Jul 25 13:22:16 qaa systemd[1]: Finished apt-daily.service - Daily apt 
> download activities.
> 
> Here, the failure occurs in the apt.systemd.daily, because the
> process used for the upgrade got the lock first. But it could
> be the other way round, as this could be seen with aptitude.

So, as mentioned above, and as we saw earlier in the bug report, it looks
like the lock handling in the apt-daily.service is not working, and is
interfering with the current run. This needs to be fixed there
somehow. Reassigning.

Thanks,
Guillem

Reply via email to