Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
on 14/04/2012 18:37 Andriy Gapon said the following: > > I would like to ask for a review and/or testing of the following three > patches: > http://people.freebsd.org/~avg/zfsboot.patches.diff I've put a new version of the patch here: http://people.freebsd.org/~avg/zfsboot.patches.2.diff Most prominent changes: - new zfsloader should be compatible with previous zfsboot - libi386 unconditionally includes zfs support via use of weak symbols for some functions Unfortunately, unconditional compatibility between i386_devdesc and zfs_devdesc means that i386_devdesc becomes much larger (> 512 bytes). But I think that it shouldn't cause any real problems. > These patches add support for booting from an arbitrary filesystem of any > detected ZFS pool. A filesystem could be selected in zfsboot and thus will > affectfrom where zfsloader would be loaded. zfsboot passes information about > the boot pool and filesystem to zfsloader, which uses those for loaddev and > default value of currdev. A different pool+filesystem could be selected in > zfsloader for booting kernel. Also if vfs.root.mountfrom is not explicitly > set > and is not derived from fstab, then it gets set to the selected boot > filesystem. > > This should could be used as a foundation for the support of Solaris-like boot > environment selection. I believe that other people have already developed > scripts utilizing ZFS capabilities to provide other aspects of management of > boot environments. > > I am particularly interested in reviews of my attempt to make ZFS boot support > arch-independent. The arches, of course, would have to add some code to make > use of that support. Currently I only enabled it for x86. > > Thank you very much! -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Status of BSD Diff replacement?
On Wed, Apr 18, 2012 at 4:30 AM, Matthew Story wrote: > Is there anyway either of you could provide me with an archive of the > working tree for these 2 perforce repos? or make it available in a branch > on svn.freebsd.org? I'd like to look into this more, but after reading > through the P4Web docs, trying to gain anonymous read-only access through > p4 itself, and then reading: > > http://lists.freebsd.org/pipermail/freebsd-questions/2007-August/156862.html > > it seems there's no real way to accommodate this sort of thing at current. > Hi Matt About 5 years ago I wrote a ruby script for extracting a branch from p4web. It worked then, but I've not looked at it since… some people found it useful for getting the latest WIP version of Ben's wpi(4) driver from perforce, so there is a copy preserved on Ben's website: http://www.clearchain.com/downloads/FreeBSD/P4fetch.rb Cheers Tom ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
on 15/04/2012 04:01 Bob Friesenhahn said the following: > It would be nice if the updated FreeBSD bootloader could have the ability to > boot both Solaris and FreeBSD root filesystems in the same pool so that one > could switch between several zfs-based operating systems without needing to > use > a different partition for each one. Is this within the bounds of possibility > or > a totally irrational thought? I can not assess feasibility of such a project. Just want to note that ultimately it's the code that gets booted, not filesystems. I have an impression that Solaris boot archive is quite different from how FreeBSD kernel gets booted. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
weak symbols vs archive libraries
I just would like to share something that I stumbled upon. Maybe this is something well known, then forgive me for the noise. When ld combines multiple object files it overrides weak symbol definitions with a strong definition (if any). There are many examples/demonstrations on the Internet on how this works, e.g.: http://winfred-lu.blogspot.com/2009/11/understand-weak-symbols-by-examples.html But when the object files are spread across multiple archives, the there could be some surprises. My understanding is that there are two big rules that linker follows with respect to archive libraries: - linker extracts an object file from a library only if according to a symbol table of the library the object file contains some interesting symbols - if linker extracts an object file then it processes all symbols in it And now the following observation: if linker has already seen a weak definition for a symbol, then it will not actively seek any other definitions for it. But it will take into account the definitions that it stumbles upon. For example, if an object file in an archive library contains only (strong) definitions for some symbols that already have weak definitions, the linker will not extract that object file and will not look into it. And thus the weak definition will not be overridden. OTOH, if that object file contains a definition for at least one still undefined symbol, then the file will be interesting to linker, it will extract the file, process _all_ symbols in it and thus will override the weak definitions with the strong ones. This is something that was unexpected to me. Some references: http://webpages.charter.net/ppluzhnikov/linker.html http://glandium.org/blog/?p=2388 -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: weak symbols vs archive libraries
On Wed, Apr 18, 2012 at 01:36:26PM +0300, Andriy Gapon wrote: > > I just would like to share something that I stumbled upon. > Maybe this is something well known, then forgive me for the noise. > > When ld combines multiple object files it overrides weak symbol definitions > with > a strong definition (if any). There are many examples/demonstrations on the > Internet on how this works, e.g.: > http://winfred-lu.blogspot.com/2009/11/understand-weak-symbols-by-examples.html > > But when the object files are spread across multiple archives, the there could > be some surprises. > My understanding is that there are two big rules that linker follows with > respect to archive libraries: > - linker extracts an object file from a library only if according to a symbol > table of the library the object file contains some interesting symbols > - if linker extracts an object file then it processes all symbols in it > > And now the following observation: if linker has already seen a weak > definition > for a symbol, then it will not actively seek any other definitions for it. > But > it will take into account the definitions that it stumbles upon. > For example, if an object file in an archive library contains only (strong) > definitions for some symbols that already have weak definitions, the linker > will > not extract that object file and will not look into it. And thus the weak > definition will not be overridden. OTOH, if that object file contains a > definition for at least one still undefined symbol, then the file will be > interesting to linker, it will extract the file, process _all_ symbols in it > and > thus will override the weak definitions with the strong ones. > > This is something that was unexpected to me. > > Some references: > http://webpages.charter.net/ppluzhnikov/linker.html > http://glandium.org/blog/?p=2388 This is from the ELF standard version 1.2 PDF, page 1-5: When the link editor searches archive libraries, it extracts archive members that contain definitions of undefined global symbols. The member's definition may be either a global or a weak symbol. The link editor does not extract archive members to resolve undefined weak symbols. Unresolved weak symbols have a zero value. pgpGyuE7zLysR.pgp Description: PGP signature
Re: weak symbols vs archive libraries
on 18/04/2012 13:49 Konstantin Belousov said the following: > This is from the ELF standard version 1.2 PDF, page 1-5: > > When the link editor searches archive libraries, it extracts archive > members that contain definitions of undefined global symbols. The member's > definition may be either a global or a weak symbol. The link editor does > not extract archive members to resolve undefined weak symbols. Unresolved > weak symbols have a zero value. I'll just add to this, in case it's not already very obvious, that the link editor does not extract archive members to find strong definitions for defined weak symbols too. Thank you for the quote! -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
On Wednesday, April 18, 2012 2:02:22 am Andriy Gapon wrote: > on 17/04/2012 23:43 John Baldwin said the following: > > On Tuesday, April 17, 2012 4:22:19 pm Andriy Gapon wrote: > >> We already have a flag for ZFS (KARGS_FLAGS_ZFS, 0x4). So the new flag > >> could be > >> named something ZFS-specific (as silly as KARGS_FLAGS_ZFS2) or something > >> more > >> general such as KARGS_FLAGS_32_BYTES meaning that the total size of > >> arguments > >> area is 32 bytes (as opposed to 24 previously). > > > > Does KARGS_FLAGS_GUID work? > > > > I think that's too terse, we already passed a pool guid via the existing > argument space. So it should be something like KARGS_FLAGS_ZFS_FS_GUID or > KARGS_FLAGS_ZFS_DS_GUID (DS - dataset). Ah. I do think the flag should indicate that the bootinfo structure is larger, I was assuming you were adding a new GUID field that didn't exist before. I can't think of something better than KARGS_FLAGS_32. What might be nice actually, is to add a new field to indicate the size of the argument area and to set a flag to indicate that the size field is present (KARGS_FLAGS_SIZE)? Hmm, looks like we should name this structure and move it and the relevant KARGS_FLAGS_* fields into a header while we are at it? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
On Wed, 2012-04-18 at 09:41 -0400, John Baldwin wrote: > On Wednesday, April 18, 2012 2:02:22 am Andriy Gapon wrote: > > on 17/04/2012 23:43 John Baldwin said the following: > > > On Tuesday, April 17, 2012 4:22:19 pm Andriy Gapon wrote: > > >> We already have a flag for ZFS (KARGS_FLAGS_ZFS, 0x4). So the new flag > > >> could be > > >> named something ZFS-specific (as silly as KARGS_FLAGS_ZFS2) or something > > >> more > > >> general such as KARGS_FLAGS_32_BYTES meaning that the total size of > > >> arguments > > >> area is 32 bytes (as opposed to 24 previously). > > > > > > Does KARGS_FLAGS_GUID work? > > > > > > > I think that's too terse, we already passed a pool guid via the existing > > argument space. So it should be something like KARGS_FLAGS_ZFS_FS_GUID or > > KARGS_FLAGS_ZFS_DS_GUID (DS - dataset). > > Ah. I do think the flag should indicate that the bootinfo structure is > larger, > I was assuming you were adding a new GUID field that didn't exist before. > I can't think of something better than KARGS_FLAGS_32. What might be nice > actually, is to add a new field to indicate the size of the argument area and > to set a flag to indicate that the size field is present (KARGS_FLAGS_SIZE)? YES! A size field (preferably as the first field in the struct) along with a flag to indicate that it's a new-style boot info struct that starts with a size field, will allow future changes without a lot of drama. It can allow code that has to deal with the struct without interpretting it (such as trampoline code that has to copy it to a new stack or memory area as part of loading the kernel) to be immune to future changes. This probably isn't a big deal in the x86 world, but it can be important for embedded systems where a proprietary bootloader has to pass info to a proprietary board_init() type routine in the kernel using non-proprietary loader/trampoline code that's part of the base. We have a bit of a mess in this regard in the ARM world right now, and it would be a lot lessy messy if something like this had been in place. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
on 18/04/2012 17:22 Ian Lepore said the following: > YES! A size field (preferably as the first field in the struct) along > with a flag to indicate that it's a new-style boot info struct that > starts with a size field, will allow future changes without a lot of > drama. It can allow code that has to deal with the struct without > interpretting it (such as trampoline code that has to copy it to a new > stack or memory area as part of loading the kernel) to be immune to > future changes. Yeah, placing the new field at front would immediately break compatibility and even access to the flags field :-) -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
On Wed, 2012-04-18 at 17:36 +0300, Andriy Gapon wrote: > on 18/04/2012 17:22 Ian Lepore said the following: > > YES! A size field (preferably as the first field in the struct) along > > with a flag to indicate that it's a new-style boot info struct that > > starts with a size field, will allow future changes without a lot of > > drama. It can allow code that has to deal with the struct without > > interpretting it (such as trampoline code that has to copy it to a new > > stack or memory area as part of loading the kernel) to be immune to > > future changes. > > Yeah, placing the new field at front would immediately break compatibility and > even access to the flags field :-) > Code would only assume the new field was at the front of the struct if the new flag is set, otherwise it would use the historical struct layout. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
On Wed, 2012-04-18 at 17:36 +0300, Andriy Gapon wrote: > on 18/04/2012 17:22 Ian Lepore said the following: > > YES! A size field (preferably as the first field in the struct) along > > with a flag to indicate that it's a new-style boot info struct that > > starts with a size field, will allow future changes without a lot of > > drama. It can allow code that has to deal with the struct without > > interpretting it (such as trampoline code that has to copy it to a new > > stack or memory area as part of loading the kernel) to be immune to > > future changes. > > Yeah, placing the new field at front would immediately break compatibility and > even access to the flags field :-) > Oh wait, is the flags field embedded in the struct? My bad, I didn't look. In the ARM code I'm used to working with, the flags are passed from the bootloader to the kernel entry point in a register; I don't know why assumed that would be true on other platforms. -- Ian ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
on 18/04/2012 17:40 Ian Lepore said the following: > On Wed, 2012-04-18 at 17:36 +0300, Andriy Gapon wrote: >> on 18/04/2012 17:22 Ian Lepore said the following: >>> YES! A size field (preferably as the first field in the struct) along >>> with a flag to indicate that it's a new-style boot info struct that >>> starts with a size field, will allow future changes without a lot of >>> drama. It can allow code that has to deal with the struct without >>> interpretting it (such as trampoline code that has to copy it to a new >>> stack or memory area as part of loading the kernel) to be immune to >>> future changes. >> >> Yeah, placing the new field at front would immediately break compatibility >> and >> even access to the flags field :-) >> > > Code would only assume the new field was at the front of the struct if > the new flag is set, otherwise it would use the historical struct > layout. Right, but where the flag would reside? And how the older code that is not aware of the new flag would cope with the new layout? -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: CAM disk I/O starvation
On Tue, 17 Apr 2012 12:30:15 -0700 Adrian Chadd wrote: > On 17 April 2012 12:15, Gary Jennejohn wrote: > > I still have the old problem kernel around, but it's probably not > > instrumented for any meaningful diagnoses. > > Well do you know which version of which tree you used to build that? > If it's head, you could "just" keep an earlier source tree around to > build a kernel from. > My /usr/src is from SVN, so I just need to boot it and do uname to get the revision. As a former src committer I like to be on the bleeding edge and I always run HEAD. -- Gary Jennejohn ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: [review request] zfsboot/zfsloader: support accessing filesystems within a pool
On Wednesday, April 18, 2012 11:00:18 am Andriy Gapon wrote: > on 18/04/2012 17:40 Ian Lepore said the following: > > On Wed, 2012-04-18 at 17:36 +0300, Andriy Gapon wrote: > >> on 18/04/2012 17:22 Ian Lepore said the following: > >>> YES! A size field (preferably as the first field in the struct) along > >>> with a flag to indicate that it's a new-style boot info struct that > >>> starts with a size field, will allow future changes without a lot of > >>> drama. It can allow code that has to deal with the struct without > >>> interpretting it (such as trampoline code that has to copy it to a new > >>> stack or memory area as part of loading the kernel) to be immune to > >>> future changes. > >> > >> Yeah, placing the new field at front would immediately break compatibility > >> and > >> even access to the flags field :-) > >> > > > > Code would only assume the new field was at the front of the struct if > > the new flag is set, otherwise it would use the historical struct > > layout. > > Right, but where the flag would reside? > And how the older code that is not aware of the new flag would cope with the > new > layout? I think the size should be appended to the end of the current structure. However, it will buy us more flexibility in the future. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
[PATCH] Implementation of DTrace sched provider (with bonus schedgraph script)
I've implemented the sched provider for FreeBSD. This provider provides probes that fire when various scheduling decisions are made. This implementation is intended to be compatible with the implementation in Solaris and its derivatives, with the following caveats: Several probes reference features that are not implemented in FreeBSD. This probes are provided but will never fire. These probes are: cpucaps-sleep, cpucaps-wakeup, schedctl-nopreempt, schedctl-preempt and schedctl-yield. I've added some extra probes that do not exist in Solaris and its derivatives, to make it possible to implement a schedgraph DTrace script. These probes are lend-pri and load-change. Scripts intended to be portable to other implementations should not reference these probes. FreeBSD currently does not properly translate internal types to the portable implementation-independent types defined in the documentation. This means that your scripts will see a struct thread * where they should get a lwpsinfo_t *, for example. The patch implementing the sched provider can be found here: http://people.freebsd.org/~rstone/patches/sched_sdt.diff This patch is against r234420. It should apply cleanly to stable/9 as well as head, but it will not compile if applied against stable/8 because of a change in the arguments accepted by the SDT_PROBE_DEFINE* macros. My D script that collections schedgraph data can be found here: http://people.freebsd.org/~rstone/dtrace/schedgraph.d I recommend collecting data with the ring bufpolicy. This causes DTrace to collect data in per-cpu ring buffers which should guarantee that there is no dropped data points. The data is written to stdout when dtrace(1) exits. In my example I exit after running for 5 seconds, but you could just as easily modify the script to run until a certain probe fires and then exit, for example. The output of schedgraph.d isn't quite ready for processing by schedgraph. Here is a very short sh script that post-processes the data to make it parseable by schedgraph: http://people.freebsd.org/~rstone/dtrace/make_ktr Finally, schedgraph.d uses the cpu variable, which is currently not available in FreeBSD. Here is my patch (which I will commit to HEAD soon) that implements that variable. You will have to rebuild dtrace.ko and libdtrace.so. http://people.freebsd.org/~rstone/patches/dtrace_cpu.diff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN
Wiring entire address space seems to have interesting side effect. The libc memory allocator calls madvise() to free the dirty unused pages, which does nothing when the pages are wired. The allocator unmaps only when entire chunk is free (default size of 1MB). That leaves lots for free pages which cannot reclaimed even when the system is under memory pressure. Sushanth --- On Mon, 4/16/12, Sushanth Rai wrote: > From: Sushanth Rai > Subject: Re: mlockall() on freebsd 7.2 + amd64 returns EAGAIN > To: "Konstantin Belousov" > Cc: a...@freebsd.org, freebsd-hackers@freebsd.org > Date: Monday, April 16, 2012, 11:41 AM > Many thanks. I verified the patch you > provided and it works fine. > > Sushanth > > > > Oh, I see. The problem is the VM_MAP_WIRE_NOHOLES > flag. > > Since we > > map only the initial stack fragment even for the > > MCL_WIREFUTURE maps, > > there is a hole in the stack region. > > > > In fact, for MCL_WIREFUTURE, we probably should map > the > > whole > > stack at once, prefaulting all pages. > > > > Below are two patches. The change for vm_mmap.c would > fix > > your immediate > > problem by allowing holes in wired region. > > > > The change for vm_map.c prefaults the whole stack > instead of > > the > > initial fragment. The single-threaded programs still > get a > > fault > > on stack growth. > > > > diff --git a/sys/vm/vm_map.c b/sys/vm/vm_map.c > > index 6198629..2fd18d1 100644 > > --- a/sys/vm/vm_map.c > > +++ b/sys/vm/vm_map.c > > @@ -3259,7 +3259,10 @@ vm_map_stack(vm_map_t map, > > vm_offset_t addrbos, vm_size_t max_ssize, > > addrbos + max_ssize < > > addrbos) > > return > > (KERN_NO_SPACE); > > > > - init_ssize = (max_ssize < sgrowsiz) ? > > max_ssize : sgrowsiz; > > + if (map->flags & MAP_WIREFUTURE) > > + init_ssize = > > max_ssize; > > + else > > + init_ssize = > > (max_ssize < sgrowsiz) ? max_ssize : sgrowsiz; > > > > PROC_LOCK(curthread->td_proc); > > vmemlim = lim_cur(curthread->td_proc, > > RLIMIT_VMEM); > > diff --git a/sys/vm/vm_mmap.c b/sys/vm/vm_mmap.c > > index 2588c85..3fccd9e 100644 > > --- a/sys/vm/vm_mmap.c > > +++ b/sys/vm/vm_mmap.c > > @@ -1561,9 +1561,11 @@ vm_mmap(vm_map_t map, > vm_offset_t > > *addr, vm_size_t size, vm_prot_t prot, > > * If the > > process has requested that all future mappings > > * be > > wired, then heed this. > > */ > > - if (map->flags > > & MAP_WIREFUTURE) > > + if (map->flags > > & MAP_WIREFUTURE) { > > > > vm_map_wire(map, *addr, *addr + size, > > - > > VM_MAP_WIRE_USER | VM_MAP_WIRE_NOHOLES); > > + > > VM_MAP_WIRE_USER | ((flags & MAP_STACK) ? > > + > > VM_MAP_WIRE_HOLESOK : VM_MAP_WIRE_NOHOLES)); > > + } > > } else { > > /* > > * If this > > mapping was accounted for in the vnode's > > > ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"