On Mon, 27 Jan 2025 19:29:24 +0000 Chris Billington <emu...@disroot.org> wrote:
> On Mon, 27 Jan 2025 14:02:17 +0000 > Chris Billington wrote: > > > I am setting up net-booting of amd64 clients with an amd64 server > > (HP Z400) running 7.6-release, using the diskless(8) manpage. > > > > I want to set up a shared /usr nfs mount for the clients as described > > in the manpage, so that a single set of installed packages can be > > managed centrally. > > > > tftpd, bootparams, dhcpd and nfs are set up as decribed in the mapage > > > > Server IP (em0) 192.168.0.254 > > NFS rootfs /export is on a separate partition. > > > > Client IP (also em0) 192.168.0.2 as set statically by mac address in > > dhcpd.conf > > > > /etc/exports on the server: > > /export/client2 -maproot=root -alldirs 192.168.0.2 > > /usr -ro -network=192.168.0.0 -netmask=255.255.255.0 > > /var/db/pkg -ro -network=192.168.0.0 -netmask=255.255.255.0 > > > > showmount -e: > > /var/db/pkg 192.168.0.0 > > /usr 192.168.0.0 > > /export/client2 192.168.0.2 > > > > /etc/fstab on client2: > > 192.168.0.254:/export/client2 / nfs rw 0 0 > > 192.168.0.254:/usr/ nfs ro 0 0 > > swap /tmp mfs rw,-s=512M 0 0 > > 192.168.0.254:/var/db/pkg /var/db/pkg nfs 0 0 > > > > When booting multi-user, all goes normally until the boot hangs at the > > point of mounting /usr in the client's /etc/rc (line 489): > > > > mount -s /usr >/dev/null 2>&1 # if NFS, fstab must use IP address > > > > By removing the redirection temporarily, I can see the following error > > on the client: > > > > mount_nfs: bad MNT RPC: RPC: Unable to send; errno = Permission denied > > > > This is repeated at intervals of 30 seconds or so. > > > > However, showmount -a on the server thinks /usr is mounted: > > > > showmount -a: > > client2:/export/client2 > > client2:/usr > > client2:/var/db/pkg > > > > At this point if I interrupt the processing of /etc/rc the boot > > continues, but fails miserably because /usr is not mounted. (verified > > with mountd -d on the server) > > > > If I do 'boot -s', after going to a shell it is possible to > > mount /usr, /tmp and /var/db/pkg without issue. > > > > If I add the bg (backgrouund the mount task) option to the client's > > fstab for /usr (ro,bg) then boot proceeds but /usr never gets mounted. > > > > As a check, I tried booting with a non-shared /usr in > > the /export/client2 directory. Booting then works without problems. But > > that defeats the object of net booting, to have a shared set of > > installed packages. > > > > One strange thing that may be relevant: > > If I listen with 'tcpdump -nvi em0' on the server, I can see the rpc > > request going to the server port 111 over udp each time the client > > attempts to mount /usr : > > > > 192.168.0.2.xxx > 192.168.0.254.111: [udp sum ok] udp 56 (ttl 64, id > > xxxxx, len 84) > > > > But the reply back to the client from the server from the same port has: > > > > 192.168.0.254.111 > 192.168.0.2.xxx: [bad udp csum 8682! -> zzzz]] udp > > 28 (ttl 64, id xxxxx, len 56) > > > > (xxxx, yyyy, zzzz are the random values chosen by the networking stack) > > > > Is it still possible to boot diskless clients with a shared /usr? What > > could be the cause of the 'bad UDP csum' errors, and the 'mount_nfs bad > > MNT RPC' error? > > It's particularly odd because single-user boot allows /usr to be > > mounted read-only without issue. > > > > I'm running out of things to try! All assistance in resolving this > > gratefully accepted.... > > > > Possibly unrelated: before the boot process of /bsd starts, I see the > > following PXEboot error flash by: > > pxe_netif_open : PXENV_UDP_OPEN failed: 0x60 > > net_open: netif_open() failed > > However, after this the booting of /bsd continues as normal until the > > nfs mount hang described above. > > > > -- > > Chris Billington > > > Further infomation: > > - the bad checksum errors are a 'red herring', they occur because the > network card supports checksum offloading and tcpdump sees the packets > before the checksum is added. > > - changing the mount options for /usr to 'ro,tcp' results in slightly > different error messages on the client when booting multiuser and > trying to mount /usr readonly: > > first error: > Cannot MNT RPC: RPC: Remote system error: Permission denied > > subsequent errors: > mount_nfs: Bad MNT RPC: RPC: Unable to send; errno = Bad file descriptor > > -- > Chris Billington > > I think I understand what was going on with read-only /usr over nfs now. The temporary pf ruleset loaded in /etc/rc contains "don't kill NFS" rules which allow communication out to the portmap/sunrpc and nfs ports on the server only, 111 and 2049: But to mount a separate /usr the client needs to talk to the mountd RPC at the reserved port number it gets from portmap, which is blocked by pf. But the mountd port varies boot-to-boot, so it can't be easily included in a rule as far as I know. I tested this was the issue by hard-including the currently-running mountd port number in the ruleset. My workaround was to move the mount command for /usr to just before the temporary pf ruleset is loaded. Single-user boot does not load the temporary pf ruleset, and if /usr is an integral part of the root filesystem, it does not get remounted by the mount -s command in /etc/rc I will make a report to bugs@ to see if this small change is possible to accept for future releases. -- Chris Billington