Hello! I'm trying to set up my server for diskless boots, as described in the diskless(8) manpage (at the moment, more or less mostly as an academic exercise, but I was planning to take my oldish laptops to some use this way).
I went along the instructions from the manpage, setting up the various pieces as I was instructed; since I was already running a limited PXE boot environment so that I can do installs more rapidly, many of the steps were already done, having to setup only rarpd and nfs. However, when I now try to get the client actually to boot from this setup, it fails quite miserably when trying to mount the root filesystem via NFS. The kernel just hangs forever, printing "RPC timeout for server 172.23.255.255 (0xac17ffff) prog 100000". After some research, I came up with an old posting from misc (http://archives.neohapsis.com/archives/openbsd/2004-01/0603.html), but without any solution. The problem described there is quite similar to the one I'm experiencing here, but without all the peculiarities that were used there (i.e., I'm using a stock 4.6-release, stock-dhcpd, stock-everything). Especially, my client does the same thing as the Soekris in that old posting, i.e. trying to connect to the NFS server at the broadcast address 172.23.255.255, instead of 172.23.12.2, which would be the "real" public address of the server. It _does_ connect to 172.23.12.2 on the original PXE bootstrap, but that might as well be because dhcpd tells it to do so, as far as I understood the process. Since the server also runs some other services, pf is running, which I first guessed might be the culprit. However, even with "pass quick" for everything coming from the particular client, nothing changes. tcpdump on the pflog-interface shows the sunrpc packets to be allowed, so I don't think that it is a PF issue. Disabling PF didn't change anything, for that matter. rpcinfo(8) shows everything up and running: | % rpcinfo -p | program vers proto port | 100000 2 tcp 111 portmapper | 100000 2 udp 111 portmapper | 100003 2 udp 2049 nfs | 100003 3 udp 2049 nfs | 100003 2 tcp 2049 nfs | 100003 3 tcp 2049 nfs | 100021 0 udp 759 nlockmgr | 100021 1 udp 759 nlockmgr | 100021 3 udp 759 nlockmgr | 100021 4 udp 759 nlockmgr | 100021 1 tcp 776 nlockmgr | 100021 3 tcp 776 nlockmgr | 100021 4 tcp 776 nlockmgr | 100024 1 udp 992 status | 100024 1 tcp 726 status | 100005 1 udp 994 mountd | 100005 3 udp 994 mountd | 100005 1 tcp 1011 mountd | 100005 3 tcp 1011 mountd Especially the portmapper itself, as this one seems to be the service that the client seems unable to find. Or at least, that's how I interpret the "prog 100000" which scrolls continuously on the client's error message. I have already tried to have tcpdump have a look at what's going on, but unfortunately, I don't see very much in its output: | $ tcpdump -n -s 140 -i em0 host 172.23.13.138 | tcpdump: listening on em0, link-type EN10MB | 01:29:31.853178 172.23.13.138.718 > 172.23.255.255.111: udp 96 | 01:29:36.853392 172.23.13.138.718 > 172.23.255.255.111: udp 96 | 01:29:41.853479 172.23.13.138.718 > 172.23.255.255.111: udp 96 (ad infinitum) As far as I see it, the client sends some UDP packet to the portmapper, but does not get any response. Since it looks like a RPC/NFS issue, I tried to see if "normal" NFS access would yield similar issues, so I had the same client try to connect from some Linux livecd thingie. This succeeded on the first try---hence, NFS seems to work, at least in general. However, the straightforward nfs mount did connect using 172.23.13.2 (i.e., the "real" address of the server"), not the broadcast address. Trying to do a mount to 172.23.255.255:/export/client resulted in an error message, namely "Network is unreachable", but no blip comes up at the tcpdump above which was still running at this time, so it might as well have been Linux who won't allow to connect NFS on the broadcast address. The previously mentioned old mailinglist posting mentioned that rpc.bootparamd'd be needed, but starting it or not does not make any difference (and http://www.netbsd.org/docs/network/netboot/intro.i386.html kind of implies that rpc.bootparamd is not needed on i386, and the manpage actively discourages it). I'm now quite at a loss now, and don't know where to look anymore. I'm sure it's just some small thing that I'm still overlooking, or some interoperatibility issue with some parts of that setup, but I don't know where to look anymore. Thanks in advance for any hints, or for just having the patience to read through to the end. :o) s//un