Re: How to debug a double fault? (Re: Could MSGBUF_SIZE be made a loader tunable?)
Andriy Gapon wrote: > on 15/12/2010 12:37 per...@pluto.rain.com said the following: > > Fatal double fault: > > eip = 0xc07feb98 > > esp = 0xc101e000 > > ebp = 0xc101e004 > > cpuid = 0; apic id = 00 > > panic: double fault > > cpuid = 0 > > > > How do I go about tracking this down? > > Do you have the standard debugging options in your kernel? No, it is 8.1-RELEASE GENERIC with only the name changed and the (first attempt) msgbufsize patches applied. I was trying to minimize changes to GENERIC, so as to minimize the opportunity to screw something up, and I had this silly idea that something this simple might "just work." It does occur to me to wonder whether any debugger would be functional this early, before even the first line of the signon message has been displayed. Is it possible, given the loader messages, to come up with a base address which could be used to compare the eip value with the kernel symbol table? Granted this won't provide a traceback, but even knowing in which function it crashed would be a start. > BTW, are you sure that you correctly placed initialization of > msgbufsize ? I am not at all sure of that, and am not sufficiently familiar with the sequence of events early in intiialization to know how to find out -- although I suppose the observed crash might not be altogether surprising if the kernel message buffer got allocated with a zero size :( Apart from the name, msgbufsize is set up in exactly the same way and place -- in init_param1() -- as maxswzone and maxbcache. Perhaps that is not early enough; any idea what would be a better example? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: How to debug a double fault? (Re: Could MSGBUF_SIZE be made a loader tunable?)
on 16/12/2010 11:34 per...@pluto.rain.com said the following: > Andriy Gapon wrote: >> on 15/12/2010 12:37 per...@pluto.rain.com said the following: >>> Fatal double fault: >>> eip = 0xc07feb98 >>> esp = 0xc101e000 >>> ebp = 0xc101e004 >>> cpuid = 0; apic id = 00 >>> panic: double fault >>> cpuid = 0 >>> >>> How do I go about tracking this down? >> >> Do you have the standard debugging options in your kernel? > > No, it is 8.1-RELEASE GENERIC with only the name changed and the > (first attempt) msgbufsize patches applied. I was trying to > minimize changes to GENERIC, so as to minimize the opportunity > to screw something up, and I had this silly idea that something > this simple might "just work." > > It does occur to me to wonder whether any debugger would be > functional this early, before even the first line of the signon > message has been displayed. Is it possible, given the loader > messages, to come up with a base address which could be used to > compare the eip value with the kernel symbol table? Granted this > won't provide a traceback, but even knowing in which function it > crashed would be a start. You can research this approach, but I would just add KDB+DDB and get a stack trace without sweat. >> BTW, are you sure that you correctly placed initialization of >> msgbufsize ? > > I am not at all sure of that, and am not sufficiently familiar with > the sequence of events early in intiialization to know how to find > out -- although I suppose the observed crash might not be altogether > surprising if the kernel message buffer got allocated with a zero > size :( > > Apart from the name, msgbufsize is set up in exactly the same > way and place -- in init_param1() -- as maxswzone and maxbcache. > Perhaps that is not early enough; any idea what would be a better > example? I don't see any connection between msgbufsize and maxswzone, so I also don't know if that place is early enough. Just try to initialize the variable where it's defined and use TUNABLE_LONG. -- Andriy Gapon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
New ZFSv28 patchset for 8-STABLE
Hi everyone, following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am providing a ZFSv28 testing patch for 8-STABLE. Link to the patch: http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz Link to mfsBSD ISO files for testing (i386 and amd64): http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-amd64.iso http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-i386.iso The root password for the ISO files: "mfsroot" The ISO files work on real systems and in virtualbox. They conatin a full install of FreeBSD 8.2-PRERELEASE with ZFS v28, simply use the provided "zfsinstall" script. The patch is against FreeBSD 8-STABLE as of 2010-12-15. When applying the patch be sure to use correct options for patch(1) and make sure the file sys/cddl/compat/opensolaris/sys/sysmacros.h gets deleted: # cd /usr/src # fetch http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz # xz -d stable-8-zfsv28-20101215.patch.xz # patch -E -p0 < stable-8-zfsv28-20101215.patch # rm sys/cddl/compat/opensolaris/sys/sysmacros.h >From Pawel's announcement: Some of the changes since the last patchset (zfs_20100831.patch): - Boot support for ZFS v28 (only RAIDZ3 is not yet supported). - Various fixes for the existing ZFS boot code. - Support for sendfile(2) (by avg@). - Userland<->kernel compatibility with v13-v15 (by mm@). - ACL fixes (by trasz@). - Various bug fixes. Please test, test, test. Chances are this is the last patchset before v28 going to HEAD (finally) and after a reasonable testing period into 8-STABLE. Especially test new changes, like boot support and sendfile(2) support. Also be sure to verify if you can import for existing ZFS pools (v13-v15) when running v28 or boot from your existing pools. Please test the (v13-v15) compatibility layer as well: Old usereland + new kernel / old kernel + new userland More information about ZFS on my blog: http://blog.vx.sk ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Samba upgrade HowTo requested
Dear Samba friends, Last weekend I decided to upgrade the samba server. We were running Samba 3.3 something and FreeBSD portupgrade was complaining that this version should be removed and assumingly replaced by the newest version. I removed the package via portupgrade and installed the 3.5.6 version. The upgrade went quite smoothly in general, but I encountered some difficulties with the printer drivers. Before the upgrade we were able to print on 4 printers. After the upgrade only 1.5 printer was working. 1 Printer worked as expected, 1 printer printed only garbage and 2 printers were not working at all. I only managed to solve the problems by de-installing and re-installing the printer drivers on the samba server. So somehow the databases in /var/db/samba/*.tdb have been messed up. I do not know what went wrong in detail and neither do I know how to prevent these kind of issues in the next upgrade. What is the procedure to upgrade samba to the newest version? How should one proceed and what are the pitfalls? How should we deal with the printer definitions and printer drivers? What should we in general do with the database files, next to backup? And specifically for FreeBSD users: How should we deal with an upgrade of samba via portupgrade? -- Met vriendelijke groeten, With kind regards, Mit freundlichen Gruessen, Willy * W.K. Offermans Home: +31 45 544 49 44 Mobile: +31 681 15 87 68 e-mail: wi...@offermans.rompen.nl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: [Samba] Samba upgrade HowTo requested
Hello Peter, On Thu, Dec 16, 2010 at 05:42:10PM +0300, Peter Trifonov wrote: > Hi Willy, > > > Last weekend I decided to upgrade the samba server. We were running > > Samba > > 3.3 something and FreeBSD portupgrade was complaining that this version > > should be removed and assumingly replaced by the newest version. I > > removed the package via portupgrade and installed the 3.5.6 version. The > Are you running winbindd on this server? If yes, does it work properly? > In my case it failed to communicate group IDs to the system, so I had to > rollback to v. 3.4.9. > > > And specifically for FreeBSD users: How should we deal with an upgrade of > samba via portupgrade? > I have upgraded it many times before, and in most cases it was just make > deinstall & make reinstall. > > > With best regards, > P. Trifonov Concerning your first question: No, we are not running winbindd, so I cannot tell you if it might work. To your second remark: Well, it might be that it has worked in your case, but certainly not in mine. I do not know what happened to the drivers or database of the drivers, but something was really messed up. I like to clarify this and to put it on a higher level. I like to figure out what the procedure is to follow and how we can inform the users about this procedure. -- Met vriendelijke groeten, With kind regards, Mit freundlichen Gruessen, Willy * W.K. Offermans Home: +31 45 544 49 44 Mobile: +31 681 15 87 68 e-mail: wi...@offermans.rompen.nl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd fails on boot
My high-tech solution to NTPDATE (et.al.) running before the link was up was to edit /etc/rc.d/NETWORKING and append these two lines at the bottom of the file: == /bin/echo "Waiting 10s for network link to wake up." /bin/sleep 10 == This has solved this startup problem in all the cases where it had previously been a problem. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: vm.swap_reserved toooooo large?
George Mamalakis wrote: > My dmesg shows: > > pid 1732 (npviewer.bin), uid 1001: exited on signal 11 (core dumped) > pid 2227 (npviewer.bin), uid 1001: exited on signal 11 (core dumped) > swap zone exhausted, increase kern.maxswzone > pid 1544 (console-kit-daemon), uid 0, was killed: out of swap space > swap zone exhausted, increase kern.maxswzone > pid 2864 (memory), uid 1001, was killed: out of swap space > swap zone exhausted, increase kern.maxswzone > pid 1676 (gconf-helper), uid 1001, was killed: out of swap space > > where one can see that pid 1544 was killed before 2864, which is the > process that caused all this mess. Yes, I know that I should use limits > so as not to allow such things to happen, but on the other hand, if a > malicious user causes such a situation he/she may gain access to > information through core-dumps on root processes, AND cause DoS attacks. No. First, when the kernel kills processes because it runs out of swap space, it uses SIGKILL which does _not_ cause a core dump to be written. Second, core dumps are always created with permissions 0600, i.e. they are only readable by the owner of the process. Of course, any user who can run a machine out of memory can cause a DoS attack by doing this. That's the reason why resource limits exist. > If it were for me, I would sort all processes based on their memory > consumption, and start by killing those that have the highest value > (top-bottom) that are NOT owned by root (just a thought, without > thinking about it too much), so as to prevent such situations from > happening. It is very non-trivial to find a generic algorithm that kills the "right" process in such a situation. For example, an attacker could start a lot of small processes that allocate memory. That's the reason why an admin should always configure resource limits approprately. The kernel's killing feature should only be regarded as the very last emergency break, which basically exists only to prevent a reboot. If you're interested, you can find the selection algorithm for processes to kill in the vm_pageout_oom() function in src/sys/vm/vm_pageout.c. Basically, it selects the process that consumes the most physical memory (RAM + swap), not counting the virtual size of the process. Also, some processes are excluded, such as system processes and protected processes (cron and sshd, for example). Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: vm.swap_reserved toooooo large?
On 16/12/2010 18:56, Oliver Fromme wrote: George Mamalakis wrote: > My dmesg shows: > > pid 1732 (npviewer.bin), uid 1001: exited on signal 11 (core dumped) > pid 2227 (npviewer.bin), uid 1001: exited on signal 11 (core dumped) > swap zone exhausted, increase kern.maxswzone > pid 1544 (console-kit-daemon), uid 0, was killed: out of swap space > swap zone exhausted, increase kern.maxswzone > pid 2864 (memory), uid 1001, was killed: out of swap space > swap zone exhausted, increase kern.maxswzone > pid 1676 (gconf-helper), uid 1001, was killed: out of swap space > > where one can see that pid 1544 was killed before 2864, which is the > process that caused all this mess. Yes, I know that I should use limits > so as not to allow such things to happen, but on the other hand, if a > malicious user causes such a situation he/she may gain access to > information through core-dumps on root processes, AND cause DoS attacks. No. First, when the kernel kills processes because it runs out of swap space, it uses SIGKILL which does _not_ cause a core dump to be written. Second, core dumps are always created with permissions 0600, i.e. they are only readable by the owner of the process. Of course, any user who can run a machine out of memory can cause a DoS attack by doing this. That's the reason why resource limits exist. > If it were for me, I would sort all processes based on their memory > consumption, and start by killing those that have the highest value > (top-bottom) that are NOT owned by root (just a thought, without > thinking about it too much), so as to prevent such situations from > happening. It is very non-trivial to find a generic algorithm that kills the "right" process in such a situation. For example, an attacker could start a lot of small processes that allocate memory. That's the reason why an admin should always configure resource limits approprately. The kernel's killing feature should only be regarded as the very last emergency break, which basically exists only to prevent a reboot. If you're interested, you can find the selection algorithm for processes to kill in the vm_pageout_oom() function in src/sys/vm/vm_pageout.c. Basically, it selects the process that consumes the most physical memory (RAM + swap), not counting the virtual size of the process. Also, some processes are excluded, such as system processes and protected processes (cron and sshd, for example). Best regards Oliver Oliver, thanx for your comments. I know it is difficult to choose which process to kill and how to be "fair" during such a killing procedure. Nevertheless, I would assume that all non-root processes would have higher priority to get killed, and that root's processes would get killed last. I understand your comments completely, but I was just so surprised when I realized how easy it was for me to kill root processes on my system. Thanx again and best regards! mamalos -- George Mamalakis IT Officer Electrical and Computer Engineer (Aristotle Un. of Thessaloniki), MSc (Imperial College of London) Department of Electrical and Computer Engineering Faculty of Engineering Aristotle University of Thessaloniki phone number : +30 (2310) 994379 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: aesni(?) corrupts data on 8.2-BETA1
On 12/12/2010 03:43, Kostik Belousov wrote: > On Sat, Dec 11, 2010 at 07:37:51PM -0500, Mike Tancsa wrote: >> On 12/11/2010 6:22 PM, Kostik Belousov wrote: >>> On Sat, Dec 11, 2010 at 06:08:08PM -0500, Mike Tancsa wrote: On 12/11/2010 11:01 AM, Kostik Belousov wrote: > > I have no access to AESNI hardware. For start, you may use > src/tools/tools/crypto/cryptotest > to somewhat verify the sanity of the driver. I doesnt happen every time, but one out of 5 or so >>> First, which arch is it, amd64 or i386 ? >>> >>> Also, please revert r216162 and do the same tests. >> >> Hi, >> Its AMD64, but i386 seems to be impacted too. I am not sure how to >> revert to a specific commit, but for now I csup'd with a date tag of >> >> *date=2010.12.02.23.00.00 >> >> which is a day before >> http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-December/004338.html >> >> >> And that seems to fix it! >> >> I have been running >> cryptotest -c -z -t 10 >> in a loop for the past 10min and not one error. > > Please try this patch on the latest HEAD or RELENG_8. > > diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c > index 482b5da..1b493b4 100644 > --- a/sys/amd64/amd64/fpu.c > +++ b/sys/amd64/amd64/fpu.c > @@ -426,7 +426,9 @@ fpudna(void) > fxrstor(&fpu_initialstate); > if (pcb->pcb_initial_fpucw != __INITIAL_FPUCW__) > fldcw(pcb->pcb_initial_fpucw); > - fpuuserinited(curthread); > + pcb->pcb_flags |= PCB_FPUINITDONE; > + if (PCB_USER_FPU(pcb)) > + pcb->pcb_flags |= PCB_USERFPUINITDONE; > } else > fxrstor(pcb->pcb_save); > critical_exit(); > diff --git a/sys/i386/isa/npx.c b/sys/i386/isa/npx.c > index 9ec5d25..f314e44 100644 > --- a/sys/i386/isa/npx.c > +++ b/sys/i386/isa/npx.c > @@ -684,7 +684,9 @@ npxdna(void) > fpurstor(&npx_initialstate); > if (pcb->pcb_initial_npxcw != __INITIAL_NPXCW__) > fldcw(pcb->pcb_initial_npxcw); > - npxuserinited(curthread); > + pcb->pcb_flags |= PCB_NPXINITDONE; > + if (PCB_USER_FPU(pcb)) > + pcb->pcb_flags |= PCB_NPXUSERINITDONE; > } else { > /* >* The following fpurstor() may cause an IRQ13 when the Regarding this patch(r216455) and r216162 I have had to back both of them out of my local tree to avoid panics on a ZFS & UFS2 i386 system. With the following panic strings: Dumptime: Thu Dec 9 08:37:40 2010 Panic String: double fault Dumptime: Thu Dec 9 08:41:57 2010 Panic String: page fault Dumptime: Fri Dec 10 00:23:35 2010 Panic String: free: address 0x85ceb000(0x85ceb000) has not been allocated. Dumptime: Fri Dec 10 14:37:33 2010 Panic String: page fault Dumptime: Sat Dec 11 04:10:01 2010 Panic String: vm_fault: fault on nofault entry, addr: 8289c000 Dumptime: Sun Dec 12 23:45:01 2010 Panic String: page fault Dumptime: Tue Dec 14 01:32:09 2010 Panic String: page fault Dumptime: Tue Dec 14 16:46:33 2010 Panic String: general protection fault Dumptime: Thu Dec 16 10:03:15 2010 Panic String: vm_fault: fault on nofault entry, addr: b3811000 Seems to be caused by r216162 or directly related to it. If further information is needed let me know. Ill be around here for the next few hours. -- jhell,v ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: aesni(?) corrupts data on 8.2-BETA1
On 12/16/2010 20:10, jhell wrote: > Regarding this patch(r216455) and r216162 I have had to back both of > them out of my local tree to avoid panics on a ZFS & UFS2 i386 system. > > With the following panic strings: > Dumptime: Thu Dec 9 08:37:40 2010 > Panic String: double fault > Dumptime: Thu Dec 9 08:41:57 2010 > Panic String: page fault > Dumptime: Fri Dec 10 00:23:35 2010 > Panic String: free: address 0x85ceb000(0x85ceb000) has not been allocated. > Dumptime: Fri Dec 10 14:37:33 2010 > Panic String: page fault > Dumptime: Sat Dec 11 04:10:01 2010 > Panic String: vm_fault: fault on nofault entry, addr: 8289c000 > Dumptime: Sun Dec 12 23:45:01 2010 > Panic String: page fault > Dumptime: Tue Dec 14 01:32:09 2010 > Panic String: page fault > Dumptime: Tue Dec 14 16:46:33 2010 > Panic String: general protection fault > Dumptime: Thu Dec 16 10:03:15 2010 > Panic String: vm_fault: fault on nofault entry, addr: b3811000 > > > Seems to be caused by r216162 or directly related to it. If further > information is needed let me know. Ill be around here for the next few > hours. > PS: Also when the system crashes with the above panic strings /boot/zfs/zpool.cache ends up corrupt leaving me with the need to boot from a good kernel and force import the pool. Also scrubs with the two revs end up with checksum errors all over the place. Without it everything returns to normal. -- jhell,v ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"