Specifying root mount options on diskless boot.
[I'm not sure if -stable is the best list for this but anyway...] I'm trying to convert an old laptop running FreeBSD 8.0 into a diskless client (since its internal HDD is growing bad spots faster than I can repair them). I have it pxebooting nicely and running with an NFS root but it then reports locking problems: devd, syslogd, moused (and maybe others) lock their PID file to protect against multiple instances. Unfortunately, these daemons all start before statd/lockd and so the locking fails and reports "operation not supported". It's not practical to reorder the startup sequence to make lockd start early enough (I've tried). Since the filesystem is reserved for this client, there's no real need to forward lock requests across the wire and so specifying "nolockd" would be another solution. Looking through sys/nfsclient/bootp_subr.c, DHCP option 130 should allow NFS mount options to be specified (though it's not clear that the relevant code path is actually followed because I don't see the associated printf()s anywhere on the console. After getting isc-dhcpd to forward this option (made more difficult because its documentation is incorrect), it still doesn't work. Understanding all this isn't helped by kenv(8) reporting three different sets of root filesystem options: boot.nfsroot.path="/tank/m3" boot.nfsroot.server="192.168.123.200" dhcp.option-130="nolockd" dhcp.root-path="192.168.123.200:/tank/m3" vfs.root.mountfrom="nfs:server:/tank/m3" vfs.root.mountfrom.options="rw,tcp,nolockd" And the console also reports conflicting root definitions: Trying to mount root from nfs:server:/tank/m3 NFS ROOT: 192.168.123.200:/tank/m3 Working through all these: boot.nfsroot.* appears to be initialised by sys/boot/i386/libi386/pxe.c but, whilst nfsclient/nfs_diskless.c can parse boot.nfsroot.options, there's no code to initialise that kenv name in pxe.c dhcp.* appears to be initialised by lib/libstand/bootp.c - which does include code to populate boot.nfsroot.options (using vendor specific DHCP option 20) but this code is not compiled in. Further studying of bootp.c shows that it's possible to initialise arbitrary kenv's using DHCP options 246-254 - but the DHCPDISCOVER packets do not request these options so they don't work without special DHCP server configuration (to forward options that aren't requested). vfs.root.* is parsed out of /etc/fstab but, other than being reported in the console message above, it doesn't appear to be used in this environment (it looks like the root entry can be commented out of /etc/fstab without problem). My final solution was to specify 'boot.nfsroot.options="nolockd"' in loader.conf - and this seems to actually work. It seems rather unfortunate that FreeBSD has code to allow NFS root mount options to be specified via DHCP (admittedly in several incompatible ways) but none actually work. A quick look at -current suggests that the situation there remains equally broken. Has anyone else tried to use any of this? And would anyone be interested in trying to make it actually work? -- Peter Jeremy pgpVVITD1dFyb.pgp Description: PGP signature
Re: /libexec/ld-elf.so.1: Cannot execute objects on /
John Baldwin wrote: On Saturday, December 25, 2010 6:43:25 am Miroslav Lachman wrote: John Baldwin wrote: On Saturday, December 11, 2010 11:51:41 am Miroslav Lachman wrote: Miroslav Lachman wrote: Garrett Cooper wrote: 2010/4/20 Miroslav Lachman<000.f...@quip.cz>: I have large storage partition (/vol0) mounted as noexec and nosuid. Then one directory from this partition is mounted by nullfs as "exec and suid" so anything on it can be executed. The directory contains full installation of jail. Jail is running fine, but some ports (PHP for example) cannot be compiled inside the jail with message: /libexec/ld-elf.so.1: Cannot execute objects on / The same apply to executing of apxs r...@rainnew ~/# /usr/local/sbin/apxs -q MPM_NAME /libexec/ld-elf.so.1: Cannot execute objects on / apxs:Error: Sorry, no shared object support for Apache. apxs:Error: available under your platform. Make sure. apxs:Error: the Apache module mod_so is compiled into. apxs:Error: your server binary '/usr/local/sbin/httpd'.. (it should return "prefork") So I think there is some bug in checking the mountpoint options, where the check is made on "parent" of the nullfs instead of the nullfs target mountpoint. It is on 6.4-RELEASE i386 GENERIC. I did not test it on another release. This is list of related mount points: /dev/mirror/gm0s2d on /vol0 (ufs, local, noexec, nosuid, soft-updates) /vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local) /usr/ports on /vol0/jail/rain_new/usr/ports (nullfs, local) devfs on /vol0/jail/rain_new/dev (devfs, local) If I changed /vol0 options to (ufs, local, soft-updates) the above error is gone and apxs / compilation works fine. Can somebody look at this problem? Can you please provide output from ktrace / truss for the issue? I did # ktrace /usr/local/sbin/apxs -q MPM_NAME The output is here http://freebsd.quip.cz/ld-elf/ktrace.out Let me know if you need something else. Thank you for your interest! The problem is still there in FreeBSD 8.1-RELEASE amd64 GENERIC (and in 7.x). Can somebody say if this is a bug or an expected "feature"? I think this is the expected behavior as nullfs is simply re-exposing /vol0 and it shouldn't be able to create a more privileged mount than the underlying mount I think (e.g. a read/write nullfs mount of a read-only filesystem would not make the underlying files read/write). It can be used to provide less privilege (e.g. a readonly nullfs mount of a read/write filesystem does not allow writes via the nullfs layer). That said, I'm not sure exactly where the permission check is failing. execve() only checks MNT_NOEXEC on the "upper" vnode's mountpoint (i.e. the nullfs mountpoint) and the VOP_ACCESS(.., V_EXEC) check does not look at MNT_NOEXEC either. I do think there might be bugs in that a nullfs mount that specifies noexec or nosuid might not enforce the noexec or nosuid bits if the underlying mount point does not have them set (from what I can see). Thank you for your explanation. Then it is strange, that there is bug, that allows execution on originally non executable mountpoint. It should be mentioned in the bugs section of the mount_nullfs man page. It would be useful, if 'mount' output shows inherited options for nullfs. If parent is: /dev/mirror/gm0s2d on /vol0 (ufs, local, noexec, nosuid, soft-updates) Then nullfs line will be: /vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local, noexec, nosuid) instead of just /vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local) Then I can understand what is expected behavior, but our current state is half working, if I can execute scripts and binaries, run jail on it, but can't execute "apxs -q MPM_NAME" and few others. Hmm, so I was a bit mistaken. The kernel is not failing to exec the binary. Instead, rtld is reporting the error here: static Obj_Entry * do_load_object(int fd, const char *name, char *path, struct stat *sbp, int flags) { Obj_Entry *obj; struct statfs fs; /* * but first, make sure that environment variables haven't been * used to circumvent the noexec flag on a filesystem. */ if (dangerous_ld_env) { if (fstatfs(fd,&fs) != 0) { _rtld_error("Cannot fstatfs \"%s\"", path); return NULL; } if (fs.f_flags& MNT_NOEXEC) { _rtld_error("Cannot execute objects on %s\n", fs.f_mntonname); return NULL; } } I wonder if the fstatfs is falling down to the original mount rather than being caught by nullfs. Hmm, nullfs' statfs method returns the flags for the underlying mount, not the flags for the nullfs mount. This is possibly broken, but it is the behavior nullfs has always had and the behavior it still has on other BSDs. I am sorry, I am not a programmer, so the code doesn't tell me much. Does it mean "we must leave it in current state" (for compatibility with other BSDs) or can it be fixed in the future? I can't
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
This is a home machine so I am afraid I won't have backups in place, if only because I just won't have another machine with as much disk space. The data is nothing critically important anyway, movies, music mostly. My objective here is getting more used to ZFS and seeing how performance gets. I remember getting rather average performance on v14 but Jean-Yves reported good performance boosts from upgrading to v15. Will try this out when the disks arrive :) Thanks for the pointers guys. On 12/30/10 6:49 PM, Ronald Klop wrote: > On Thu, 30 Dec 2010 12:40:00 +0100, Damien Fleuriot wrote: > >> Hello list, >> >> >> >> I currently have a ZFS zraid1 with 4x 1.5TB drives. >> The system is a zfs-only FreeBSD 8.1 with zfs version 14. >> >> I am concerned that in the event a drive fails, I won't be able to >> repair the disks in time before another actually fails. >> >> >> >> >> I wish to reinstall the OS on a dedicated drive (possibly SSD, doesn't >> matter, likely UFS) and dedicate the 1.5tb disks to storage only. >> >> I have ordered 5x new drives and would like to create a new zraid2 >> mirrored pool. >> >> Then I plan on moving data from pool1 to pool2, removing drives from >> pool1 and adding them to pool2. >> >> >> >> My questions are as follows: >> >> With a total of 9x 1.5TB drives, should I be using zraid3 instead of >> zraid2 ? I will not be able to add any more drives so unnecessary parity >> drives = less storage room. >> >> What are the steps for properly removing my drives from the zraid1 pool >> and inserting them in the zraid2 pool ? >> >> >> Regards, >> >> >> dfl > > Make sure you have spare drives so you can swap in a new one quickly and > have off-line backups in case disaster strikes. Extra backups are always > nice. Disks are not the only parts which can break and damage your data. > > Ronald. > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks
On 2 January 2011 02:11, Damien Fleuriot wrote: > I remember getting rather average performance on v14 but Jean-Yves > reported good performance boosts from upgrading to v15. that was v28 :) saw no major difference between v14 and v15. JY ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: New ZFSv28 patchset for 8-STABLE
On 12/16/2010 01:44 PM, Martin Matuska wrote: Link to the patch: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz I've used this: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz on a server with amd64, 8 G RAM, acting as a file server on ftp/http/rsync, the content being read only mounted with nullfs in jails, and the daemons use sendfile (ftp and http). The effects can be seen here: http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/ the exact moment of the switch can be seen on zfs_mem-week.png, where the L2 ARC has been discarded. What I see: - increased CPU load - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased hard disk load (IOPS graph) Maybe I could accept the higher system load as normal, because there were a lot of things changed between v15 and v28 (but I was hoping if I use the same feature set, it will require less CPU), but dropping the L2ARC hit rate so radically seems to be a major issue somewhere. As you can see from the memory stats, I have enough kernel memory to hold the L2 headers, so the L2 devices got filled up to their maximum capacity. Any ideas on what could cause these? I haven't upgraded the pool version and nothing was changed in the pool or in the file system. Thanks, ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: New ZFSv28 patchset for 8-STABLE
On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy wrote: > What I see: > - increased CPU load > - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased > hard disk load (IOPS graph) > ... > Any ideas on what could cause these? I haven't upgraded the pool version and > nothing was changed in the pool or in the file system. The fact that L2 ARC is full does not mean that it contains the right data. Initial L2ARC warm up happens at a much higher rate than the rate L2ARC is updated after it's been filled initially. Even accelerated warm-up took almost a day in your case. In order for L2ARC to warm up properly you may have to wait quite a bit longer. My guess is that it should slowly improve over the next few days as data goes through L2ARC and those bits that are hit more often take residence there. The larger your data set, the longer it will take for L2ARC to catch the right data. Do you have similar graphs from pre-patch system just after reboot? I suspect that it may show similarly abysmal L2ARC hit rates initially, too. --Artem ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: New ZFSv28 patchset for 8-STABLE
On 01/01/2011 08:09 PM, Artem Belevich wrote: On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagy wrote: What I see: - increased CPU load - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased hard disk load (IOPS graph) ... Any ideas on what could cause these? I haven't upgraded the pool version and nothing was changed in the pool or in the file system. The fact that L2 ARC is full does not mean that it contains the right data. Initial L2ARC warm up happens at a much higher rate than the rate L2ARC is updated after it's been filled initially. Even accelerated warm-up took almost a day in your case. In order for L2ARC to warm up properly you may have to wait quite a bit longer. My guess is that it should slowly improve over the next few days as data goes through L2ARC and those bits that are hit more often take residence there. The larger your data set, the longer it will take for L2ARC to catch the right data. Do you have similar graphs from pre-patch system just after reboot? I suspect that it may show similarly abysmal L2ARC hit rates initially, too. Sadly no, but I remember that I've seen increasing hit rates as the cache grew, that's what I wrote the email after one and a half days. Currently it's at the same level, when it was right after the reboot... We'll see after few days. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: bge driver regression in 7.4-PRERELEASE, Tyan S4881
On Thu, 30 Dec 2010, Jeremy Chadwick wrote: Please provide output from the following command, as root: pciconf -lbvc And only include the bge1 and bge0 devices in your output. Thanks. This is the output, as root, using the kernel with the 10/7/2010 bge code (which works for me). I can provide the code with the 7.4-PRERELEASE kernel if you want that. OS is compiled as amd64. b...@pci0:17:2:0: class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03 hdr=0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme Dual Gigabit Adapter (BCM5704)' class = network subclass = ethernet bar [10] = type Memory, range 64, base 0xd011, size 65536, enabled bar [18] = type Memory, range 64, base 0xd010, size 65536, enabled cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split transact ion cap 01[48] = powerspec 2 supports D0 D3 current D0 cap 03[50] = VPD cap 05[58] = MSI supports 8 messages, 64 bit b...@pci0:17:2:1: class=0x02 card=0x164814e4 chip=0x164814e4 rev=0x03 hdr=0x00 vendor = 'Broadcom Corporation' device = 'NetXtreme Dual Gigabit Adapter (BCM5704)' class = network subclass = ethernet bar [10] = type Memory, range 64, base 0xd013, size 65536, enabled bar [18] = type Memory, range 64, base 0xd012, size 65536, enabled cap 07[40] = PCI-X 64-bit supports 133MHz, 2048 burst read, 1 split transact ion cap 01[48] = powerspec 2 supports D0 D3 current D0 cap 03[50] = VPD cap 05[58] = MSI supports 8 messages, 64 bit This is a hobby system supporting a home server, so it's not "mission-critical" and my current hack is working properly. Thanks to both of you for your assistance. Mike Squires mi...@siralan.org UN*X at home since 1986 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
tmpfs runs out of space on 8.2pre-release, zfs related?
In setting up tmpfs (so not tmpmfs) on a machine that is using zfs(v15, zfs v4) on 8.2prerelease I run out of space on the tmpfs when copying a file of ~4.6 GB file from the zfs-filesystem to the memory disk. This machine has 8GB of memory backed by swap on the harddisk, so I expected the file to copy to memory without problems. Below in detail what happens: upon rebooting the machine the tmpfs has 8GB available as can be seen below: --- h...@pulsarx4:~/ > df -hi /tmp FilesystemSizeUsed Avail Capacity iused ifree %iused Mounted on tmpfs 8.2G 12K8.2G 0% 19 39M0% /tmp --- Subsequently copying a ~4.6GB file from a location in the zfs-pool to the memory filesystem fails with no more space left message --- h...@pulsarx4:~/ > cp ~/temp/large.iso /tmp/large_file cp: /tmp/large_file: No space left on device --- After this the tmpfs has shrunk to just 2.7G, obviously much less than the 8.2G available before the copy-operation. At the same time there are still free inodes left, so that does not appear to be the problem. Output of the df after the copy: --- h...@pulsarx4:~/ > df -hi /tmp FilesystemSizeUsed Avail Capacity iused ifree %iused Mounted on tmpfs 2.7G2.7G1.4M 100% 19 6.4k0% /tmp --- A quick search shows the following bug-report for solaris: http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=e4ae9c32983000ef651e38edbba1?bug_id=6804661This appears closely related as here I also try to copy a file >50% of memory to the tmpfs and the way to reproduce appears identical to what I did here. As it might help spot the problem: below the information on the zfs ARC size obtained from the output of zfs-stats. This gives: Before the copy: --- System Memory Statistics: Physical Memory:8161.74M Kernel Memory: 511.64M DATA: 94.27% 482.31M TEXT: 5.73% 29.33M ARC Size: Current Size (arcsize): 5.88% 404.38M Target Size (Adaptive, c): 100.00% 6874.44M Min Size (Hard Limit, c_min): 12.50% 859.31M Max Size (High Water, c_max): ~8:16874.44M --- After the copy: --- System Memory Statistics: Physical Memory:8161.74M Kernel Memory: 3326.98M DATA: 99.12% 3297.65M TEXT: 0.88% 29.33M ARC Size: Current Size (arcsize): 46.99% 3230.55M Target Size (Adaptive, c): 100.00% 6874.44M Min Size (Hard Limit, c_min): 12.50% 859.31M Max Size (High Water, c_max): ~8:16874.44M --- Unfortunately I have difficulties interpreting this further than this, so suggestions how to prevent this behavior (or further trouble shoot this) would be appreciated as my feeling is that this should not happen. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: New ZFSv28 patchset for 8-STABLE
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 01/01/2011 13:18, Attila Nagy wrote: > On 12/16/2010 01:44 PM, Martin Matuska wrote: >> Link to the patch: >> >> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz >> >> >> > I've used this: > http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101223-nopython.patch.xz > > on a server with amd64, 8 G RAM, acting as a file server on > ftp/http/rsync, the content being read only mounted with nullfs in > jails, and the daemons use sendfile (ftp and http). > > The effects can be seen here: > http://people.fsn.hu/~bra/freebsd/20110101-zfsv28-fbsd/ > the exact moment of the switch can be seen on zfs_mem-week.png, where > the L2 ARC has been discarded. > > What I see: > - increased CPU load > - decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased > hard disk load (IOPS graph) > > Maybe I could accept the higher system load as normal, because there > were a lot of things changed between v15 and v28 (but I was hoping if I > use the same feature set, it will require less CPU), but dropping the > L2ARC hit rate so radically seems to be a major issue somewhere. > As you can see from the memory stats, I have enough kernel memory to > hold the L2 headers, so the L2 devices got filled up to their maximum > capacity. > > Any ideas on what could cause these? I haven't upgraded the pool version > and nothing was changed in the pool or in the file system. > Running arc_summary.pl[1] -p4 should print a summary about your l2arc and you should also notice in that section that there is a high number of "SPA Mismatch" mine usually grew to around 172k before I would notice a crash and I could reliably trigger this while in scrub. What ever is causing this needs desperate attention! I emailed mm@ privately off-list when I noticed this going on but have not received any feedback as of yet. [1] http://bit.ly/fdRiYT - -- Regards, jhell,v JJH48-ARIN -BEGIN PGP SIGNATURE- iQEcBAEBAgAGBQJNH/msAAoJEJBXh4mJ2FR+bFYH/0bBJbLYU5zzbqpUUXX1M/B9 +g8RwQ9Tek4/fxwpD8DNIfkpzO0MvUcx5Nhwld69jk7sSys9IUpYhuYVggcOgavx sl6AwNNUG0XD/spO2RvV3jD4tVbR6TjlSdLCyBG7iPFU2nNB6wZM+UfWxGYwEyUE loOr13Vk4eU2l2cepUwJH0oGu2hsDZ7qR/fTd+d33NfS6/PT43vbCjPNTsnDJeY9 MdeC5vBUPl3AW3iC/5hxBi9WABGMHeAXTolpAtBQVBNi22mINacYFO6FEdfANy9E Xo207Cd6vBmZb8aTs0BFHs5ZdTHUco/iNysaWvzx9TlIWlyyBRgOXgtBweB+6d4= =lcxW -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"