[Kernel-packages] [Bug 2060780] Re: CIFS stopped working/is unstable with kernel update to 5.15.0-102.112

Daniel Dawson Sun, 21 Apr 2024 10:46:10 -0700

Hi, I believe that kernel 5.15.0-105-generic may not have have solved
the issue entirely.


I upgraded to 104 from -proposed, and then when 105 was available, I
updated to 105, and then removed 104 and the -proposed repo entirely.
While, the issue does not present itself as frequently, I still
encounter similar if not the same CIFS errors that were not present in
older kernels.

After updating to 105, I no longer encountered the issue when I first
connect to the SMB share. However, I encounter the issue during long
file copies to an SMB share. After trying to copy 20GB of data to the
smb server, the the copy is interrupted after some time, and then the
mount point is broken and unable to be accessed.

I am using dockers cifs volumes to mount smb shares into docker
containers. The host uses cifs and mounts these into the
/var/lib/docker/volumes/... directories which are remapped to containers
in some way.

I have encountered the issue on 105 after the following steps.
On the host, I directly added a mount in `/etc/fstab`, updated `systemctl 
daemon-reload`, and mounted the share `mount -a`.
I then copy 20GB of data to the share `cp -r /path/to/my/data /mnt/rf/data`. 
After about a minute, the copy terminates.
In `dmesg`, I see the cifs errors. When I look at the host CIFS mount point, I 
can see that the folder is inaccessible.

uname -a
```
Linux docker-gpu-01 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 
2024 x86_64 x86_64 x86_64 GNU/Linux
```

apt list --installed | grep "cifs-utils"
```
cifs-utils/jammy-updates,jammy-security,now 2:6.14-1ubuntu0.1 amd64 [installed]
```

cat /etc/fstab
```
...
//10.0.6.2/rf   /mnt/rf cifs uid=1000,gid=1000,credentials=/home/user/.smb 0 0
```


sudo dmesg
```
[138420.302415] CIFS: Attempting to mount \\10.0.6.2\rf
[138420.315798] CIFS: VFS: parse_server_interfaces: malformed interface info
[138538.756550] CIFS: VFS: \\10.0.6.2 sends on sock 000000009d8f9284 stuck for 
15 seconds
[138538.757040] CIFS: VFS: \\10.0.6.2 Error -11 sending data on socket to server
[138543.032429] CIFS: reconnect tcon failed rc = -13
[138543.051221] CIFS: VFS: No writable handle in writepages rc=-13
[138543.053159] CIFS: VFS: No writable handle in writepages rc=-13
[138543.063685] CIFS: VFS: No writable handle in writepages rc=-13
[138543.065707] CIFS: VFS: No writable handle in writepages rc=-13
[138543.077380] CIFS: VFS: No writable handle in writepages rc=-13
[138543.080055] CIFS: VFS: No writable handle in writepages rc=-13
[138543.090757] CIFS: VFS: No writable handle in writepages rc=-13
[138543.092996] CIFS: VFS: No writable handle in writepages rc=-13
[138543.151054] CIFS: VFS: \\10.0.6.2\media Close unmatched open for MID:116
```

Notes:
You can see that at `138420` I ran `mount -a` and the mount was added.
There was an error `parse_server_interfaces: malformed interface info`, but the 
file system was mounted and available.
Shortly after, I started the file transfer (I started it probably after 20-30 
seconds.).
This file transfer failed by `138538` and produced the `VFS: \\10.0.6.2 sends 
on sock 000000009d8f9284 stuck for 15 seconds` error.
I can see on the sever that about `6GB/20GB` of data has transferred at this 
point.
Finally, you can see that at `138543`, a different mount (managed by docker) 
failed with the error `VFS: \\10.0.6.2\media Close unmatched open for MID:116`.

The storage system has a 1Gbps link to the server. Assuming 80MBps, the
file copy would have made it through 6GB in about 75 seconds before it
encountered this error, which is consistent with the dmesg timeline and
my experience while watching the file copy.

The mount point on the host now looks like this:

sudo ls -hal /mnt
```
ls: cannot access '/mnt/rf': Permission denied
total 8.0K
drwxr-xr-x  3 root root 4.0K Apr 21 10:03 .
drwxr-xr-x 19 root root 4.0K Mar 26 21:02 ..
d?????????  ? ?    ?       ?            ? rf
```

I have not looked at the bugfix code, or the code that introduced the
bug, but I am not fully convinced that the issue is solved.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2060780

Title:
  CIFS stopped working/is unstable with kernel update to 5.15.0-102.112

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Jammy:
  Fix Released

Bug description:
  Hi,

  updated some Ubuntu 22.04 systems to lastest available state this
  morning, which caused CIFS mounts (from various fileservers) to stop
  working. Kernel was updated to version 5.15.0-102-generic.

  I can mount the shares without problems (mount -t cifs), but then, df for 
example tells me: df: /mnt: Resource temporarily unavailable.
  I'm able to list and browse all the files, but accessing them (even readonly) 
is very unstable. Sometimes it works and sometimes it just gives me i/o errors. 

  Switching back to  5.15.0-101-generic or 5.15.0-100-generic solves the
  problem and everything works again as expected.

  Seems like some bug has been implemented in 5.15.0-102-generic...

  To reproduce the problem, I started a while loop on one server to
  write to some file on a specific mounted CIFS share and read it from
  another one

  root@<hostname1>:~# while true; do echo "$(date) hallo" >> /mnt/hallo.txt; 
sleep 1 ; done
  -bash: /mnt/hallo.txt: Input/output error
  -bash: /mnt/hallo.txt: Input/output error
  ^C

  root@<hostname2>:~$ tail -f /mnt/hallo.txt
  Tue Apr  9 04:10:52 PM CEST 2024 hallo
  Tue Apr  9 04:10:53 PM CEST 2024 hallo
  Tue Apr  9 04:10:54 PM CEST 2024 hallo
  Tue Apr  9 04:10:55 PM CEST 2024 hallo
  Tue Apr  9 04:10:56 PM CEST 2024 hallo
  Tue Apr  9 04:10:57 PM CEST 2024 hallo
  Tue Apr  9 04:10:58 PM CEST 2024 hallo
  Tue Apr  9 04:10:59 PM CEST 2024 hallo
  Tue Apr  9 04:11:00 PM CEST 2024 hallo
  Tue Apr  9 04:11:01 PM CEST 2024 hallo
  tail: cannot determine location of '/mnt/hallo.txt'. reverting to polling: 
Resource temporarily unavailable
  Tue Apr  9 04:11:04 PM CEST 2024 hallo
  Tue Apr  9 04:11:05 PM CEST 2024 hallo
  Tue Apr  9 04:11:06 PM CEST 2024 hallo
  Tue Apr  9 04:11:07 PM CEST 2024 hallo
  Tue Apr  9 04:11:08 PM CEST 2024 hallo
  Tue Apr  9 04:11:09 PM CEST 2024 hallo
  Tue Apr  9 04:11:10 PM CEST 2024 hallo

  While doing this, both servers tell me, the resource is unavailable

  root@<hostname1>:~# df -h /mnt
  df: /mnt: Resource temporarily unavailable

  root@<hostname2>:~$ df -h /mnt
  df: /mnt: Resource temporarily unavailable

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2060780/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2060780] Re: CIFS stopped working/is unstable with kernel update to 5.15.0-102.112

Reply via email to