Re: bin/144214: zfsboot fails on gang block after upgrade to zfs v14
I think I nailed this problem now. What was additionally needed was the following change: if (!vdev || !vdev->v_read) return (EIO); - if (vdev->v_read(vdev, bp, &zio_gb, offset, SPA_GANGBLOCKSIZE)) + if (vdev->v_read(vdev, NULL, &zio_gb, offset, SPA_GANGBLOCKSIZE)) return (EIO); Full patch is here: http://people.freebsd.org/~avg/boot-zfs-gang.diff Apparently I am not as smart as Roman :) because I couldn't find the bug by just starring at this rather small function (for couple of hours), so I had to reproduce the problem to catch it. Hence I am copying hackers@ to share couple of tricks that were new to me. Perhaps, they could help someone else some other day. First, after very helpful hints that I received in parallel from pjd and two Oracle/Sun developers it became very easy to reproduce a pool with files with gang blocks in them. One can set metaslab_gang_bang variable in metaslab.c to some value < 128K and then blocks with size greater than metaslab_gang_bang will be allocated as gang blocks with 25% chance. I personally did something similar but slightly more deterministic: --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c @@ -1572,6 +1572,12 @@ zio_dva_allocate(zio_t *zio) ASSERT3U(zio->io_prop.zp_ndvas, <=, spa_max_replication(spa)); ASSERT3U(zio->io_size, ==, BP_GET_PSIZE(bp)); + /*XXX XXX XXX XXX*/ + if (zio->io_size > 8 * 1024) { + return (zio_write_gang_block(zio)); + } + /*XXX XXX XXX XXX*/ + error = metaslab_alloc(spa, mc, zio->io_size, bp, zio->io_prop.zp_ndvas, zio->io_txg, NULL, 0); This ensured that any block > 8K would be a gang block. Then I compiled zfs.ko with this change and put it into a virtual machine where I created a pool and populated its root/boot filesystem with /boot directory. Booted in virtual machine from the new virtual disk and immediately hit the problem. So far, so good, but still no clue why zfsboot crashes upon encountering a gang block. So I decided to debug the crash with gdb. Standard steps: $ qemu ... -S -s $ gdb ... (gdb) target remote localhost:1234 Now I didn't want to single-step through the whole boot process, so I decided to get some help from gdb. Here's a trick: (gdb) add-symbol-file /usr/obj/usr/src/sys/boot/i386/gptzfsboot/gptzfsboot.out 0xa000 gptzfsboot.out is an ELF image produced by GCC, which then gets transformed into a raw binary and then into final BTX binary (gptzfsboot). gptzfsboot.out is built without much debugging data but at least it contains information about function names. Perhaps it's even possible to compile gptzfsboot.out with higher debug level, then debugging would be much more pleasant. 0xA000 is where _code_ from gptzfsboot.out ends up being loaded in memory. BTW, having only shallow knowledge about boot chain and BTX I didn't know this address. Another GDB trick helped me: (gdb) append memory boot.memdump 0x0 0x1 This command dumps memory content in range 0x0-0x1 to a file named boot.memdump. Then I produced a hex dump and searched for byte sequence with which gptzfsboot.bin starts (raw binary produced produced from gptzfsboot.out). Of course, memory dump should be taken after gptzfsboot is loaded into memory :) Catching the right moment requires a little bit of boot process knowledge. I caught it with: (gdb) b *0xC000 That is, memory dump was taken after gdb stopped at the above break point. After that it was a piece of cake. I set break point on zio_read_gang function (after add-symbol-file command) and the stepi-ed through the code (that is, instruction by instruction). The following command made it easier to see what's getting executed: (gdb) display/i 0xA000 + $eip I quickly stepped though the code and saw that a large value was passed to vdev_read as 'bytes' parameter. But this should have been 512. The oversized read into a buffer allocated on stack smashed the stack and that was the end. Backtracking the call chain in source code I immediately noticed the bp condition in vdev_read_phys and realized what the problem was. Hope this would be a useful reading. -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: bin/144214: zfsboot fails on gang block after upgrade to zfs v14
Andriy Gapon wrote: I think I nailed this problem now. What was additionally needed was the following change: if (!vdev || !vdev->v_read) return (EIO); - if (vdev->v_read(vdev, bp, &zio_gb, offset, SPA_GANGBLOCKSIZE)) + if (vdev->v_read(vdev, NULL, &zio_gb, offset, SPA_GANGBLOCKSIZE)) return (EIO); Full patch is here: http://people.freebsd.org/~avg/boot-zfs-gang.diff Apparently I am not as smart as Roman :) because I couldn't find the bug by just starring at this rather small function (for couple of hours), so I had to reproduce the problem to catch it. Hence I am copying hackers@ to share couple of tricks that were new to me. Perhaps, they could help someone else some other day. Excellent, I'm glad that this is finally tested and seems to be working. When I initially added the code, I wasn't able to test it and it turned out the the issue that I was trying to resolve wasn't actually gang block related anyway. robert. First, after very helpful hints that I received in parallel from pjd and two Oracle/Sun developers it became very easy to reproduce a pool with files with gang blocks in them. One can set metaslab_gang_bang variable in metaslab.c to some value < 128K and then blocks with size greater than metaslab_gang_bang will be allocated as gang blocks with 25% chance. I personally did something similar but slightly more deterministic: --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c @@ -1572,6 +1572,12 @@ zio_dva_allocate(zio_t *zio) ASSERT3U(zio->io_prop.zp_ndvas, <=, spa_max_replication(spa)); ASSERT3U(zio->io_size, ==, BP_GET_PSIZE(bp)); + /*XXX XXX XXX XXX*/ + if (zio->io_size > 8 * 1024) { + return (zio_write_gang_block(zio)); + } + /*XXX XXX XXX XXX*/ + error = metaslab_alloc(spa, mc, zio->io_size, bp, zio->io_prop.zp_ndvas, zio->io_txg, NULL, 0); This ensured that any block > 8K would be a gang block. Then I compiled zfs.ko with this change and put it into a virtual machine where I created a pool and populated its root/boot filesystem with /boot directory. Booted in virtual machine from the new virtual disk and immediately hit the problem. So far, so good, but still no clue why zfsboot crashes upon encountering a gang block. So I decided to debug the crash with gdb. Standard steps: $ qemu ... -S -s $ gdb ... (gdb) target remote localhost:1234 Now I didn't want to single-step through the whole boot process, so I decided to get some help from gdb. Here's a trick: (gdb) add-symbol-file /usr/obj/usr/src/sys/boot/i386/gptzfsboot/gptzfsboot.out 0xa000 gptzfsboot.out is an ELF image produced by GCC, which then gets transformed into a raw binary and then into final BTX binary (gptzfsboot). gptzfsboot.out is built without much debugging data but at least it contains information about function names. Perhaps it's even possible to compile gptzfsboot.out with higher debug level, then debugging would be much more pleasant. 0xA000 is where _code_ from gptzfsboot.out ends up being loaded in memory. BTW, having only shallow knowledge about boot chain and BTX I didn't know this address. Another GDB trick helped me: (gdb) append memory boot.memdump 0x0 0x1 This command dumps memory content in range 0x0-0x1 to a file named boot.memdump. Then I produced a hex dump and searched for byte sequence with which gptzfsboot.bin starts (raw binary produced produced from gptzfsboot.out). Of course, memory dump should be taken after gptzfsboot is loaded into memory :) Catching the right moment requires a little bit of boot process knowledge. I caught it with: (gdb) b *0xC000 That is, memory dump was taken after gdb stopped at the above break point. After that it was a piece of cake. I set break point on zio_read_gang function (after add-symbol-file command) and the stepi-ed through the code (that is, instruction by instruction). The following command made it easier to see what's getting executed: (gdb) display/i 0xA000 + $eip I quickly stepped though the code and saw that a large value was passed to vdev_read as 'bytes' parameter. But this should have been 512. The oversized read into a buffer allocated on stack smashed the stack and that was the end. Backtracking the call chain in source code I immediately noticed the bp condition in vdev_read_phys and realized what the problem was. Hope this would be a useful reading. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: bin/144214: zfsboot fails on gang block after upgrade to zfs v14
On 27 May 2010 09:35, Andriy Gapon wrote: > > > I think I nailed this problem now. > What was additionally needed was the following change: >if (!vdev || !vdev->v_read) >return (EIO); > - if (vdev->v_read(vdev, bp, &zio_gb, offset, SPA_GANGBLOCKSIZE)) > + if (vdev->v_read(vdev, NULL, &zio_gb, offset, SPA_GANGBLOCKSIZE)) >return (EIO); > > Full patch is here: > http://people.freebsd.org/~avg/boot-zfs-gang.diff > > Apparently I am not as smart as Roman :) because I couldn't find the bug by > just > starring at this rather small function (for couple of hours), so I had to > reproduce the problem to catch it. Hence I am copying hackers@ to share > couple > of tricks that were new to me. Perhaps, they could help someone else some > other > day. > > First, after very helpful hints that I received in parallel from pjd and > two > Oracle/Sun developers it became very easy to reproduce a pool with files > with > gang blocks in them. > One can set metaslab_gang_bang variable in metaslab.c to some value < 128K > and > then blocks with size greater than metaslab_gang_bang will be allocated as > gang > blocks with 25% chance. I personally did something similar but slightly > more > deterministic: > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c > @@ -1572,6 +1572,12 @@ zio_dva_allocate(zio_t *zio) >ASSERT3U(zio->io_prop.zp_ndvas, <=, spa_max_replication(spa)); >ASSERT3U(zio->io_size, ==, BP_GET_PSIZE(bp)); > > + /*XXX XXX XXX XXX*/ > + if (zio->io_size > 8 * 1024) { > + return (zio_write_gang_block(zio)); > + } > + /*XXX XXX XXX XXX*/ > + >error = metaslab_alloc(spa, mc, zio->io_size, bp, >zio->io_prop.zp_ndvas, zio->io_txg, NULL, 0); > > This ensured that any block > 8K would be a gang block. > Then I compiled zfs.ko with this change and put it into a virtual machine > where > I created a pool and populated its root/boot filesystem with /boot > directory. > Booted in virtual machine from the new virtual disk and immediately hit the > problem. > > So far, so good, but still no clue why zfsboot crashes upon encountering a > gang > block. > > So I decided to debug the crash with gdb. > Standard steps: > $ qemu ... -S -s > $ gdb > ... > (gdb) target remote localhost:1234 > > Now I didn't want to single-step through the whole boot process, so I > decided to > get some help from gdb. Here's a trick: > (gdb) add-symbol-file > /usr/obj/usr/src/sys/boot/i386/gptzfsboot/gptzfsboot.out > 0xa000 > > gptzfsboot.out is an ELF image produced by GCC, which then gets transformed > into > a raw binary and then into final BTX binary (gptzfsboot). > gptzfsboot.out is built without much debugging data but at least it > contains > information about function names. Perhaps it's even possible to compile > gptzfsboot.out with higher debug level, then debugging would be much more > pleasant. > > 0xA000 is where _code_ from gptzfsboot.out ends up being loaded in memory. > BTW, having only shallow knowledge about boot chain and BTX I didn't know > this > address. Another GDB trick helped me: > (gdb) append memory boot.memdump 0x0 0x1 > > This command dumps memory content in range 0x0-0x1 to a file named > boot.memdump. Then I produced a hex dump and searched for byte sequence > with > which gptzfsboot.bin starts (raw binary produced produced from > gptzfsboot.out). > > Of course, memory dump should be taken after gptzfsboot is loaded into > memory :) > Catching the right moment requires a little bit of boot process knowledge. > I caught it with: > (gdb) b *0xC000 > > That is, memory dump was taken after gdb stopped at the above break point. > > After that it was a piece of cake. I set break point on zio_read_gang > function > (after add-symbol-file command) and the stepi-ed through the code (that is, > instruction by instruction). The following command made it easier to see > what's > getting executed: > (gdb) display/i 0xA000 + $eip > > I quickly stepped though the code and saw that a large value was passed to > vdev_read as 'bytes' parameter. But this should have been 512. The > oversized > read into a buffer allocated on stack smashed the stack and that was the > end. > > Backtracking the call chain in source code I immediately noticed the bp > condition in vdev_read_phys and realized what the problem was. > > Hope this would be a useful reading. > Excellent work - thanks for looking into this. I still think its easier to debug this code in userland using a shim that redirects the zfsboot i/o calls to simple read system calls... ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: bin/144214: zfsboot fails on gang block after upgrade to zfs v14
on 27/05/2010 17:40 Doug Rabson said the following: > > Excellent work - thanks for looking into this. I still think its easier > to debug this code in userland using a shim that redirects the zfsboot > i/o calls to simple read system calls... Absolutely! That should much easier. Do you have such a shim that you could share? I'd be much obliged for it. And not only I, I think. Thanks! -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Custom USB layout & sysinstall (Starting FIXIT)
Still no answer? Hey, there is also a thread: http://forums.freebsd.org/showthread.php?t=14059 ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Custom USB layout & sysinstall (Starting FIXIT)
On Thu, May 27, 2010 at 3:53 PM, none none wrote: > Still no answer? > > Hey, there is also a thread: > http://forums.freebsd.org/showthread.php?t=14059 Hate to say but you're doing something unsupported, so unless you walk through the process by yourself to figure out where things are going wrong, I'm not sure others have the time to help you in this endeavor. sysinstall(8) assumes a custom environment and setup; trying to unravel it would be painful, so I don't suggest doing that. If you're going to roll your own solution you might as well roll the entire thing from scratch. Thanks, -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Custom USB layout & sysinstall (Starting FIXIT)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 5/15/10 8:01 AM, none none wrote: > On Sat, May 15, 2010 at 12:14 AM, Ken Smith wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> On 5/14/10 1:16 PM, none none wrote: >>> I've read it, all. >>> What he is proposing, is about building our own image flavor. >>> (make-memstick.sh) >>> Exactly, that act, is an issue here, as it confuses sysinstall's USB >>> detection. >> >> This part of what you say confuses me. I use make-memstick.sh to build >> the .img files people are downloading and using to do installs with. >> So if you are using it correctly any machine that can use the .img >> files I build and we distribute should be able to use what you >> produce. > > Ah, I was unclear. When I've put "make-memstick.sh", in bracket, I was > referring to similarity of steps. > Not to the usage, of actual make-memstick.sh script. > > There are 2 types of customizations: > A) Content (All in UFS) > B) Layout (MBR, slices, boot code, bsdlabel,...) > > make-memstick.sh script is limited only to customization of A), so I > am not using it. > And shell command which I utilize are far more complex. > > I do A) and B) customizations, where B) is a culprit, that confuses > sysinstall. > > Focus on this: > Official FreeBSD memstick.img once 'dd'-ed appears as da0a > My edition appears as da0s2a ( because of me doing B) ) > > Once I turn on my machine, at boot time I select USB as a boot device. > Then: BIOS -> MBR of da0 -> slice 2 -> boot loader -> sysinstall > > Now, while in sysinstall, I decide to go in Fixit mode. > When I select a USB device, I get an error msg: > "No USB devices found!" > > Other parts of sysinstall, DO list ad4 (my HDD) and da0 (my USB stick) > correctly. With respect to your "Still no answer" message I'm not sure what you're expecting for an answer. You answer yourself above. The customizations you're doing that you refer to as "B" do indeed confuse sysinstall's disk recognition semantics. As part of your customizations you'll need to adjust sysinstall's disk recognition semantics to understand the layout you are setting up. I'm not quite sure what else you are expecting. I can't think of some easy fix that would get you past the problem you are experiencing without some hacking done to sysinstall. I'm also not sure if that sort of hacking would be suitable for the general case (what works now). - -- Ken Smith - - From there to here, from here to | kensm...@buffalo.edu there, funny things are everywhere. | - Theodore Geisel | -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkv/AzIACgkQ/G14VSmup/a6AgCeKkm2mP3H47jOjHVpU90I7gDy t3MAmwYxKxaoHbwsBrgmX27M6DqzbmZd =03Ri -END PGP SIGNATURE- ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Custom USB layout & sysinstall (Starting FIXIT)
On Fri, May 28, 2010 at 1:02 AM, Garrett Cooper wrote: > On Thu, May 27, 2010 at 3:53 PM, none none wrote: >> Still no answer? >> >> Hey, there is also a thread: >> http://forums.freebsd.org/showthread.php?t=14059 > > Hate to say but you're doing something unsupported, so unless you walk > through the process by yourself to figure out where things are going > wrong, I'm not sure others have the time to help you in this endeavor. I did found and posted code snippet that is responsible for detecting devices. But I have no C lang knowledge needed to do any coding. I know PHP and JS, ..., so have been observing patterns and echo-ed text/strings to terminal > sysinstall(8) assumes a custom environment and setup; trying to > unravel it would be painful, so I don't suggest doing that. sysinstall(8) expects precise enviroment. Anything deviating from it's hardcoded path is being ignored. > If you're going to roll your own solution you might as well roll the entire > thing from scratch. > > Thanks, > -Garrett My solution is to get rid of sysinstall. Idealy, if sysinstall would be skipped upon boot and Fixit# started immediately I have a blueprint in my head, with beginning, that I am hammering here, but without knowledge to code it. > With respect to your "Still no answer" message I'm not sure what > you're expecting for an answer. You answer yourself above. I expected a patch, an .diff, which I would apply to /usr/src/usr.sbin/sysinstall/... And finally recompile it. > The customizations you're doing that you refer to as "B" do indeed > confuse sysinstall's disk recognition semantics. As part of > your customizations you'll need to adjust sysinstall's disk > recognition semantics to understand the layout you are setting > up. I'm not quite sure what else you are expecting. Someone, to do that for me and handed over, to me, on a silver plate. I am not a FreeBSD dev. Hell, I've just entered this list and this is my first "topic"., > I can't think of some easy fix that would get you past the problem you > are experiencing without some hacking done to sysinstall. I'm > also not sure if that sort of hacking would be suitable for > the general case (what works now). > > - -- >Ken Smith I think it would be, as it would just look in a extended way for devices. Thus, covering old ones and satisfying my needs/aims Domagoj S. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: Custom USB layout & sysinstall (Starting FIXIT)
On 28.05.2010 4:27, Domagoj S. wrote: I did found and posted code snippet that is responsible for detecting devices. But I have no C lang knowledge needed to do any coding. I know PHP and JS, ..., so have been observing patterns and echo-ed text/strings to terminal This code doesn't responsible for detecting devices. Look at "deviceGetAll" function. -- WBR, Andrey V. Elsukov ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
setfb not work with ipv6
9.0-CURRENT #0: Sun Apr 11 23:26:21 EEST 2010 amd64 net.my_fibnum: 0 net.add_addr_allfibs: 1 net.fibs: 3 # netstat -rn | grep default defaultXXX.XXX.XXX.254 UGS 0 37594940 tun1 default 2001:5c0:1400:b::27e8 UGS gif0 # setfib 1 netstat -rn | grep default default 2001:5c0:1400:b::27e8 UGS gif0 # setfib 2 netstat -rn | grep default default 2001:5c0:1400:b::27e8 UGS gif0 # setfib 2 route -n add -inet6 default 2001:470:27:140::1 route: writing to routing socket: File exists add net default: gateway 2001:470:27:140::1: route already in table # setfib 2 route -n add -inet6 default 2001:470:27:140::1 route: writing to routing socket: File exists add net default: gateway 2001:470:27:140::1: route already in table Change routes without setfib # route -n change -inet6 default 2001:470:27:140::1 change net default: gateway 2001:470:27:140::1 # route -n change -inet6 default 2001:5c0:1400:b::27e8 change net default: gateway 2001:5c0:1400:b::27e8 Open PR? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: setfb not work with ipv6
Sorry... Instead, "add" to read "change"... # setfib 2 route -n add -inet6 default 2001:470:27:140::1 route: writing to routing socket: File exists add net default: gateway 2001:470:27:140::1: route already in table # setfib 2 route -n change -inet6 default 2001:470:27:140::1 route: writing to routing socket: Address family not supported by protocol family change net default: gateway 2001:470:27:140::1: Address family not supported by protocol family ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"