Per comment#23, the ip from AIX 7.2 client are: 9.20.120.127 name = adia.v6.hursley.ibm.com -- Primary 9.20.121.46 name = amberjack.v6.hursley.ibm.com ? Partner
And I searched the trace again with above ips, looks socket 00000000cc6f0db2 is created between 9.20.120.127 and nfs server, however it can also return EAGAIN. duckseason kernel: [13254.724411] svc: socket 00000000cc6f0db2 sendto([000000008485f39d 72... ], 72) = 72 (addr 9.20.120.127, port=1022) ... duckseason kernel: [13254.724734] svc: socket 00000000cc6f0db2(inet 00000000c831762e), busy=0 duckseason kernel: [13254.724759] svc: server 00000000728e82a2, pool 0, transport 00000000cc6f0db2, inuse=2 duckseason kernel: [13254.724761] svc: tcp_recv 00000000cc6f0db2 data 1 conn 0 close 0 duckseason kernel: [13254.724765] svc: socket 00000000cc6f0db2 recvfrom(00000000b6708704, 4) = 4 duckseason kernel: [13254.724766] svc: TCP record, 168 bytes duckseason kernel: [13254.724769] svc: socket 00000000cc6f0db2 recvfrom(0000000057dbced3, 4096) = 168 duckseason kernel: [13254.724771] svc: TCP final record (168 bytes) duckseason kernel: [13254.724775] svc: svc_authenticate (1) duckseason kernel: [13254.724779] svc: server 00000000ee62a401, pool 0, transport 00000000cc6f0db2, inuse=3 duckseason kernel: [13254.724780] svc: tcp_recv 00000000cc6f0db2 data 1 conn 0 close 0 duckseason kernel: [13254.724783] svc: socket 00000000cc6f0db2 recvfrom(00000000b6708704, 4) = -11 And it is same for socket 000000003497acd5 which is used between 9.20.121.46 and nfs server. duckseason kernel: [13254.802249] svc: socket 000000003497acd5 sendto([0000000086e5a045 72... ], 72) = 72 (addr 9.20.121.46, port=1020) ... duckseason kernel: [13254.802533] svc: socket 000000003497acd5(inet 0000000072c9551d), busy=0 duckseason kernel: [13254.802571] svc: server 00000000728e82a2, pool 0, transport 000000003497acd5, inuse=2 duckseason kernel: [13254.802573] svc: tcp_recv 000000003497acd5 data 1 conn 0 close 0 duckseason kernel: [13254.802578] svc: socket 000000003497acd5 recvfrom(0000000077f9cf7c, 4) = 4 duckseason kernel: [13254.802579] svc: TCP record, 164 bytes duckseason kernel: [13254.802583] svc: socket 000000003497acd5 recvfrom(0000000057dbced3, 4096) = 164 duckseason kernel: [13254.802585] svc: TCP final record (164 bytes) duckseason kernel: [13254.802590] svc: svc_authenticate (1) duckseason kernel: [13254.802596] svc: server 00000000ee62a401, pool 0, transport 000000003497acd5, inuse=3 duckseason kernel: [13254.802597] svc: tcp_recv 000000003497acd5 data 1 conn 0 close 0 duckseason kernel: [13254.802599] svc: socket 000000003497acd5 recvfrom(0000000077f9cf7c, 4) = -11 But since aix 7.2 client can work with the same server according to bug description, I am curious why 7.2 client also return EAGAIN which is same as 7.3 client, what am I missing? Some questions/suggestion: 1. Did aix 7.3 nfs client work with previous kernel? If so, run "git bisect" to find which commit caused the issue. 2. Is it possible to try with latest 5.4 stable kernel as suggested in comment#1? Also try latest upstream kernel (6.9-rc5 at this time) as well. 3. Does increase lease time make difference? -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2042363 Title: AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl() on a Ubuntu 20.04 NFSV4 server Status in linux package in Ubuntu: New Bug description: ---Problem Description--- AIX 7.3 NFS client frequently returns an EIO error to an application when reading or writing to a file that has been locked with fcntl(). NFS server is Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not appear to affect other combinations of NFS client (including AIX 7.2) with this NFS server. The AIX team have indicated that the cause of the EIO is triggered by the NFS server returning a BAD_SEQID error which leads to the AIX NFS client incorrectly zeroing the stateid, which then leads to the NFS server returning a BAD_STATEID error and the NFS client then returns the EIO error. The AIX team would like to understand why the BAD_SEQID has been returned. ---uname output--- Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 2.30GHz ---Steps to Reproduce--- We cannot offer a simple way to recreate the problem as it involves IBM MQ running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 storage. However, we can provide any requested trace or dumps from any or all of the involved machines. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp