Hi all,
I've just run in to an issue mounting on a newly upgraded client running 2.12.5
with 2.10.3 servers. Just to give some background, we're about to replace our
existing Lustre storage, but will run it concurrently with the replacement for
a couple of months. We'll be running 2.12.5 server on the new MDS and OSSs and
I plan to update all clients to the same version. I would like to avoid
updating the existing servers though.
The problem is this. The servers have two tcp LNET networks, tcp and tcp1, on
separate subnets and VLANs. The clients only see tcp1 (a small number are also
on tcp3, routed via 2 lnet routers), which has been fine until now. With the
2.12.5 client, however, it is trying to mount from tcp. 2.10.3 to 2.12.5 is
obviously a bit of a jump, but does anyone have any ideas on what has changed
and what I could do here please?
meta# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: tcp
local NI(s):
- nid: 10.110.0.21@tcp
status: up
interfaces:
0: bond0.22
- net type: tcp1
local NI(s):
- nid: 10.10.0.91@tcp1
status: up
interfaces:
0: bond0
meta# lnetctl route show
route:
- net: tcp2
gateway: 10.10.0.254@tcp1
- net: tcp3
gateway: 10.10.0.254@tcp1
client# lnetctl net show
net:
- net type: lo
local NI(s):
- nid: 0@lo
status: up
- net type: o2ib
local NI(s):
- nid: 10.12.170.47@o2ib
status: up
interfaces:
0: ib0
- net type: tcp1
local NI(s):
- nid: 10.10.170.47@tcp1
status: up
interfaces:
0: em1
[Tue Dec 1 11:07:55 2020] LNetError:
2127:0:(lib-move.c:1999:lnet_handle_find_routed_path()) no route to
10.110.0.21@tcp from <?>
[Tue Dec 1 11:08:01 2020] LustreError:
1792:0:(mgc_request.c:249:do_config_log_add()) MGC10.10.0.91@tcp1: failed
processing log, type 1: rc = -5
[Tue Dec 1 11:08:08 2020] LustreError: 2169:0:(mgc_request.c:599:do_requeue())
failed processing log: -5
[Tue Dec 1 11:08:19 2020] LNetError:
2127:0:(lib-move.c:1999:lnet_handle_find_routed_path()) no route to
10.110.0.22@tcp from <?>
[Tue Dec 1 11:08:30 2020] LustreError: 15c-8: MGC10.10.0.91@tcp1: The
configuration from log 'lustre-client' failed (-5). This may be the result of
communication errors between this node and the MGS, a bad configuration, or
other errors. See the syslog for more information.
client# lctl ping 10.10.0.91@tcp1
12345-0@lo
12345-10.110.0.21@tcp
12345-10.10.0.91@tcp1
Any suggestions will be greatly appreciated!
Many thanks,
Mark
Dr Mark Lundie | Research IT Systems Administrator | Research IT | Directorate
of IT Services | B39, Sackville Street Building | The University of Manchester
| Manchester | M1 3WE | 0161 275 8403 | ri.itservices.manchester.ac.uk<http://>
Working Hours: Tues - Thurs 0730-1730; Fri 0730-1630
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org