** Changed in: linux (Ubuntu Kinetic)
Status: Fix Committed => Fix Released
** Changed in: linux (Ubuntu Jammy)
Status: Fix Committed => Fix Released
** Changed in: linux (Ubuntu Focal)
Status: Fix Committed => Fix Released
** Changed in: linux (Ubuntu)
Status: In Progress => Fix Released
** Changed in: ubuntu-z-systems
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2002256
Title:
[UBUNTU 22.04] zfcp: fix double free of FSF request when qdio send
fails
Status in Ubuntu on IBM z Systems:
Fix Released
Status in linux package in Ubuntu:
Fix Released
Status in linux source package in Focal:
Fix Released
Status in linux source package in Jammy:
Fix Released
Status in linux source package in Kinetic:
Fix Released
Bug description:
Description: zfcp: fix double free of FSF request when qdio send fails
Symptom: When doing maintenance actions on FCP devices that turn off a
FCP device while I/O is still running on it in Linux - for
example turning off the channel path of the FCP device - the
Linux kernel crashes.
Problem: We used to use the wrong type of integer in
'zfcp_fsf_req_send()' to cache the FSF request ID when sending
a
new FSF request. This is used in case the sending fails and we
need to remove the request from our internal hash table again
(so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int'
(signed and 32 bit wide), but the rest of the zfcp code (and
the
firmware specification) handles the ID as 'unsigned long'/'u64'
(unsigned and 64 bit wide [s390x ELF ABI]).
For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated
to
32 bit when storing it in the cache variable and so doesn't
match the original ID anymore.
The second less obvious problem is that even when the
original ID has not yet grown past 32 bit, as soon as the 32nd
bit is set in the original ID (0x80000000 = 2'147'483'648) we
will have a mismatch when we cast it back to 'unsigned long'
because casting the signed type 'int' into the wider type
'unsigned long' will use a sign-extending instruction, and so
flip all leading zeros to one instead.
If we can't successfully remove the request from the hash table
again after 'zfcp_qdio_send()' fails (this happens regularly
when zfcp cannot notify the adapter about new work because the
adapter is already gone during e.g. a ChpID toggle) we will end
up with a double free.
We unconditionally free the request in the calling function
when 'zfcp_fsf_req_send()' fails, but because the request is
still in the hash table we end up with a stale memory
reference,
and once the zfcp adapter is either reset during recovery or
shutdown we end up freeing the same memory twice.
Solution: To fix this, simply change the type of the cache variable to
'unsigned long', like the rest of zfcp and also the argument
for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong
sign extension and so can successfully remove the request from
the hash table.
Reproduction: Run I/O on a FCP device for so long that you have sent
2'147'483'648 requests. The current request number can not be
read directly from user space, but can be read indirectly by
using 'zfcp_ping' and 'zfcpdbf' (use the correct
device-bus-ID):
sudo sh -c 'zfcp_ping -a "${0}" 0xFFFFFFFFFFFFFFFF \
2>/dev/null 1>&2; zfcpdbf "${0}" -x all -i SAN 2>/dev/null
\
| grep -E -e "^(Timestamp|Request ID)[[:blank:]]+:" | tail
\
-n2' 0.0.1700
After having reached 0x80000000 requests, stop all I/O on the
FCP device and start only a single process doing
single-threaded
synchronous, direct I/O on the FCP device (always only one
outstanding I/O operation).
While this I/O process is running, turn of the channel path
(ChpID) that is used for the FCP device/subchannel. This will
not always trigger the bug, but occasionally it will.
Proof that it hit the correct code-path in zfcp can be
found
by using 'zfcpdbf' again (use the correct device-bus-ID):
zfcpdbf 0.0.1700 -x all -i REC 2>/dev/null | grep
'fsrs__1'
In case you hit the correct code-path this will print some
lines
starting with 'Tag'.
Upstream-ID: 0954256e970ecf371b03a6c9af2cf91b9c4085ff
Preventive: yes
Date: 2022-12-19
Author: Benjamin Block <[email protected]>
Component: kernel
Link:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0954256e970e
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/2002256/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp