Per comment#23, the ip from AIX 7.2 client are:

9.20.120.127 name = adia.v6.hursley.ibm.com -- Primary
9.20.121.46 name = amberjack.v6.hursley.ibm.com ? Partner


And I searched the trace again with above ips, looks socket 00000000cc6f0db2 is 
created between 9.20.120.127 and nfs server, however it can also return EAGAIN.

duckseason kernel: [13254.724411] svc: socket 00000000cc6f0db2 
sendto([000000008485f39d 72... ], 72) = 72 (addr 9.20.120.127, port=1022)
...
duckseason kernel: [13254.724734] svc: socket 00000000cc6f0db2(inet 
00000000c831762e), busy=0
duckseason kernel: [13254.724759] svc: server 00000000728e82a2, pool 0, 
transport 00000000cc6f0db2, inuse=2
duckseason kernel: [13254.724761] svc: tcp_recv 00000000cc6f0db2 data 1 conn 0 
close 0
duckseason kernel: [13254.724765] svc: socket 00000000cc6f0db2 
recvfrom(00000000b6708704, 4) = 4
duckseason kernel: [13254.724766] svc: TCP record, 168 bytes
duckseason kernel: [13254.724769] svc: socket 00000000cc6f0db2 
recvfrom(0000000057dbced3, 4096) = 168
duckseason kernel: [13254.724771] svc: TCP final record (168 bytes)
duckseason kernel: [13254.724775] svc: svc_authenticate (1)
duckseason kernel: [13254.724779] svc: server 00000000ee62a401, pool 0, 
transport 00000000cc6f0db2, inuse=3
duckseason kernel: [13254.724780] svc: tcp_recv 00000000cc6f0db2 data 1 conn 0 
close 0
duckseason kernel: [13254.724783] svc: socket 00000000cc6f0db2 
recvfrom(00000000b6708704, 4) = -11

And it is same for socket 000000003497acd5 which is used between
9.20.121.46 and nfs server.

duckseason kernel: [13254.802249] svc: socket 000000003497acd5 
sendto([0000000086e5a045 72... ], 72) = 72 (addr 9.20.121.46, port=1020)
...
duckseason kernel: [13254.802533] svc: socket 000000003497acd5(inet 
0000000072c9551d), busy=0
duckseason kernel: [13254.802571] svc: server 00000000728e82a2, pool 0, 
transport 000000003497acd5, inuse=2
duckseason kernel: [13254.802573] svc: tcp_recv 000000003497acd5 data 1 conn 0 
close 0
duckseason kernel: [13254.802578] svc: socket 000000003497acd5 
recvfrom(0000000077f9cf7c, 4) = 4
duckseason kernel: [13254.802579] svc: TCP record, 164 bytes
duckseason kernel: [13254.802583] svc: socket 000000003497acd5 
recvfrom(0000000057dbced3, 4096) = 164
duckseason kernel: [13254.802585] svc: TCP final record (164 bytes)
duckseason kernel: [13254.802590] svc: svc_authenticate (1)
duckseason kernel: [13254.802596] svc: server 00000000ee62a401, pool 0, 
transport 000000003497acd5, inuse=3
duckseason kernel: [13254.802597] svc: tcp_recv 000000003497acd5 data 1 conn 0 
close 0
duckseason kernel: [13254.802599] svc: socket 000000003497acd5 
recvfrom(0000000077f9cf7c, 4) = -11 

But since aix 7.2 client can work with the same server according to bug
description, I am curious why 7.2 client also return EAGAIN which is
same as 7.3 client, what am I missing?

Some questions/suggestion:

1. Did aix 7.3 nfs client work with previous kernel? If so, run "git bisect" to 
find which commit caused the issue.
2. Is it possible to try with latest 5.4 stable kernel as suggested in 
comment#1? Also try latest upstream kernel (6.9-rc5 at this time) as well.
3. Does increase lease time make difference?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2042363

Title:
  AIX 7.3 NFS client frequently returns an EIO error to an application
  when reading or writing to a file that has been locked with fcntl() on
  a Ubuntu 20.04 NFSV4 server

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  AIX 7.3 NFS client frequently returns an EIO error to an application when 
reading or writing to a file that has been locked with fcntl(). NFS server is 
Ubuntu 20.04.6 LTS, GNU/Linux 5.4.0-139-generic x86_64. The problem does not 
appear to affect other combinations of NFS client (including AIX 7.2) with this 
NFS server.

  The AIX team have indicated that the cause of the EIO is triggered by the NFS 
server returning a BAD_SEQID error which leads to the AIX NFS client 
incorrectly zeroing the stateid, which then leads to the NFS server returning a 
BAD_STATEID error and the NFS client then returns the EIO error. The AIX team 
would like to understand why the BAD_SEQID has been returned.
   
  ---uname output---
  Linux duckseason 5.4.0-156-generic #173-Ubuntu SMP Tue Jul 11 07:25:22 UTC 
2023 x86_64 x86_64 x86_64 GNU/Linux
   
  Machine Type = VMware ESXi Server 7.0 4 x Intel(R) Xeon(R) Gold 6348H CPU @ 
2.30GHz  

  ---Steps to Reproduce---
   We cannot offer a simple way to recreate the problem as it involves IBM MQ 
running on two primary machines (AIX) using the Ubuntu server for it's HA NFSv4 
storage.

  However, we can provide any requested trace or dumps from any or all
  of the involved machines.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2042363/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to