Public bug reported:

[ Impact ]

The Linux SCSI driver uses `alloc_workqueue()` to create a kernel
workqueue for session transmit work. This call would cause the kernel <
6.x to create a dedicated worker thread for the workqueue. The userspace
library open-iscsi version < 2.1.10 then adjusts the workqueue thread's
nice value for performance reasons when a new iSCSI session is
initiated. The algorithm for that is roughly as follows
(https://github.com/open-iscsi/open-
iscsi/blob/2.1.9/usr/initiator.c#L1390) :

- Check if the driver in use has a write work queue. If not, abort.
- Open the /proc dir, and iterate over all dir entries:
- Run "stat" over /proc/<n>/stat
- Read the contents of "stat" file, which looks like the following:
898582 (kworker/u512:1-iscsi_q_0) I 2 0 0 0 -1 69238880 0 0 0 0 0 8 0 0 20 0 1 
0 52431895 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 1 0 0 17 28 0 0 
0 0 0 0 0 0 0 0 0 0 0
- Try to locate "(" with strchr, starting from the beginning. Skip the process 
if not found.
- Try to locate ")" with strchr, starting from the position of "(". Skip the 
process if not found.
- Check whether the string between "(" and ")" contains the following pattern 
"iscsi_q_%d"
- Check the number %d matches with the host ID of the session.
- If %d matches the session id, grab the PID of the current proc entry and call 
`setpriority()`

So the algorithm assumes the following about the kernel workqueue
thread:

- It would be present in the /proc list
- Its name would match the iscsi_q_%d pattern

Due to the changes in how Linux workqueue threads work in v6.x, the
priority setting approach won't work for the following reasons:

- The `alloc_workqueue()` no longer creates a dedicated thread for the 
workqueue. The workqueue thread is shared between different workqueues.
- The workqueue thread is dynamically renamed to the name of the workqueue 
that's actively running
- The workqueue thread disappears from the /proc list when it's inactive

The algorithm as-is does the following right now:

- If the kernel workqueue thread *by luck* happens to be running the
iscsi task, the name matches, and the priority is set. But that's not
what the code wants to do since it also increases the priority for all
the other tasks that are scheduled to the workqueue thread as well.

- If not, the open-iscsi prints the following log message, and proceeds
to operate as normal:

```
iscsistart: Could not set session1 priority. READ/WRITE throughout and latency 
could be affected.
```

The upstream has fixed this issue with the patch
(https://github.com/open-iscsi/open-iscsi/pull/445). The patch sets the
default nice value for `node.session.xmit_thread_priority` to `0`, and
then skips the priority adjustment algorithm altogether when the
priority is set to zero.

This SRU proposes to backport this patch to the Ubuntu releases that use
Linux kernel 6.8 and above by default, and have an open-iscsi version of
less than (2.1.10).


[ Test Plan ]

# Launch a test VM:
$> lxc launch ubuntu:noble --vm iscsi-test-noble

# Obtain a shell from the VM:
$> lxc shell iscsi-test-noble

# Install 'tgt' and 'open-iscsi':
$> sudo apt -y update && sudo apt -y install tgt open-iscsi 

# Configure 'tgt':

## Step 1: Configure a LUN

Add the following to '/etc/tgt/conf.d/iscsi.conf':

<target iqn.2020-07.example.com:lun1>
     backing-store /dev/sda
     initiator-address 127.0.0.1
     incominguser iscsi-user password
     outgoinguser iscsi-target secretpass
</target>

(change /dev/sda to an existing device's name if it's not present)

## Step 2: Restart 'tgt' to make changes effective:
$> systemctl restart tgt

## Step 3: Check if 'tgt' has started serving the LUN:
$> tgtadm --mode target --op show

(output should be similar to below)
Target 1: iqn.2020-07.example.com:lun1
    System information:
        Driver: iscsi
        State: ready
    I_T nexus information:
    LUN information:
        LUN: 0
            Type: controller
            SCSI ID: IET     00010000
            SCSI SN: beaf10
            Size: 0 MB, Block size: 1
            Online: Yes
            Removable media: No
            Prevent removal: No
            Readonly: No
            SWP: No
            Thin-provisioning: No
            Backing store type: null
            Backing store path: None
            Backing store flags: 
    Account information:
        iscsi-user
        iscsi-target (outgoing)
    ACL information:
        127.0.0.1


# Configure 'open-iscsi':

## Step 1: Check whether the LUN being served by 'tgt' is discoverable:

$> iscsiadm -m discovery -t st -p 127.0.0.1
# (should output the text below)
127.0.0.1:3260,1 iqn.2020-07.example.com:lun1

## Step 2: Configure open-iscsi to consume the target LUN:

Add the following line to '/etc/iscsi/initiatorname.iscsi':

InitiatorName=iqn.2020-07.example.com:lun1

## Step 3: Modify the following file 
'/etc/iscsi/nodes/iqn.2020-07.example.com:lun1/127.0.0.1,3260,1/default':
# (the file must already exist, it should've been automatically created after 
the discovery)

Append the following lines to the end of the file, and save:

```
node.session.auth.authmethod = CHAP  
node.session.auth.username = iscsi-user
node.session.auth.password = password          
node.session.auth.username_in = iscsi-target
node.session.auth.password_in = secretpass         
node.startup = automatic
```

## Step 4: Restart open-iscsi to make changes effective:

$> systemctl restart open-iscsi.service iscsid

## Step 5: Check the outcome

# (the service status should indicate that login to 
'iqn.2020-07.example.com:lun1' has been successful)
systemctl status open-iscsi.service 
● open-iscsi.service - Login to default iSCSI targets
     Loaded: loaded (/usr/lib/systemd/system/open-iscsi.service; enabled; 
preset: enabled)
     Active: active (exited) since Mon 2024-07-22 13:36:15 UTC; 4s ago
       Docs: man:iscsiadm(8)
             man:iscsid(8)
    Process: 3049 ExecStart=/usr/sbin/iscsiadm -m node --loginall=automatic 
(code=exited, status=0/SUCCESS)
    Process: 3065 ExecStart=/usr/lib/open-iscsi/activate-storage.sh 
(code=exited, status=0/SUCCESS)
   Main PID: 3065 (code=exited, status=0/SUCCESS)
        CPU: 4ms

Jul 22 13:36:15 welcomed-bluebird systemd[1]: Starting open-iscsi.service - 
Login to default iSCSI targets...
Jul 22 13:36:15 welcomed-bluebird iscsiadm[3049]: Logging in to [iface: 
default, target: iqn.2020-07.example.com:lun1, portal: 127.0.0.1,3260]
Jul 22 13:36:15 welcomed-bluebird iscsiadm[3049]: Login to [iface: default, 
target: iqn.2020-07.example.com:lun1, portal: 127.0.0.1,3260] successful.
Jul 22 13:36:15 welcomed-bluebird systemd[1]: Finished open-iscsi.service - 
Login to default iSCSI targets

# (the command should list an active connection to the 
'iqn.2020-07.example.com:lun1')
$> iscsiadm -m session -o show
tcp: [1] 127.0.0.1:3260,1 iqn.2020-07.example.com:lun1 (non-flash)


# Observe iscsid complaining about priority:

cat /var/log/syslog | grep "Could not set"
2024-07-22T13:36:16.874243+00:00 welcomed-bluebird iscsid: Could not set 
session1 priority. READ/WRITE throughout and latency could be affected.
2024-07-22T13:38:31.002732+00:00 welcomed-bluebird iscsid: Could not set 
session1 priority. READ/WRITE throughout and latency could be affected.


# TODO: Add fix ppa steps here

[ Where problems could occur ]

The change prevents a priority change that shouldn't happen in the first
place. That might affect some workloads unknowingly depending on it. On
the other hand, the nice setting happens intermittently (i.e. by luck)
so the behavior right now can't be depended on anyway. The patch only
touches the priority setting code so I wouldn't expect any serious
breakage.


 * Think about what the upload changes in the software. Imagine the change is
   wrong or breaks something else: how would this show up?

 * It is assumed that any SRU candidate patch is well-tested before
   upload and has a low overall risk of regression, but it's important
   to make the effort to think about what ''could'' happen in the
   event of a regression.

 * This must '''never''' be "None" or "Low", or entirely an argument as to why
   your upload is low risk.

 * This both shows the SRU team that the risks have been considered,
   and provides guidance to testers in regression-testing the SRU.

[ Other Info ]

The other releases that is running a 6.x kernel installed with other
means (e.g. hw-enablement, availability) may set the
`node.session.xmit_thread_priority` from `-20` to `0` in
`/etc/iscsid.conf` as a workaround:

node.session.xmit_thread_priority = 0

which is the default priorty for the workqueue threads.

** Affects: open-iscsi (Ubuntu)
     Importance: Undecided
     Assignee: Mustafa Kemal Gilor (mustafakemalgilor)
         Status: In Progress

** Affects: open-iscsi (Ubuntu Noble)
     Importance: Undecided
     Assignee: Mustafa Kemal Gilor (mustafakemalgilor)
         Status: In Progress

** Affects: open-iscsi (Ubuntu Oracular)
     Importance: Undecided
     Assignee: Mustafa Kemal Gilor (mustafakemalgilor)
         Status: In Progress

** Also affects: open-iscsi (Ubuntu Noble)
   Importance: Undecided
       Status: New

** Also affects: open-iscsi (Ubuntu Oracular)
   Importance: Undecided
     Assignee: Mustafa Kemal Gilor (mustafakemalgilor)
       Status: New

** Changed in: open-iscsi (Ubuntu Noble)
     Assignee: (unassigned) => Mustafa Kemal Gilor (mustafakemalgilor)

** Changed in: open-iscsi (Ubuntu Noble)
       Status: New => In Progress

** Changed in: open-iscsi (Ubuntu Oracular)
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2073846

Title:
  [SRU] Fix the session workqueue thread priority setting issue for
  newer Linux kernels (>=6.x)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/2073846/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to