Re: How to debug a double fault? (Re: Could MSGBUF_SIZE be made a loader tunable?)

2010-12-16 Thread perryh
Andriy Gapon  wrote:
> on 15/12/2010 12:37 per...@pluto.rain.com said the following:
> > Fatal double fault:
> > eip = 0xc07feb98
> > esp = 0xc101e000
> > ebp = 0xc101e004
> > cpuid = 0; apic id = 00
> > panic: double fault
> > cpuid = 0
> > 
> > How do I go about tracking this down?
>
> Do you have the standard debugging options in your kernel?

No, it is 8.1-RELEASE GENERIC with only the name changed and the
(first attempt) msgbufsize patches applied.  I was trying to
minimize changes to GENERIC, so as to minimize the opportunity
to screw something up, and I had this silly idea that something
this simple might "just work."

It does occur to me to wonder whether any debugger would be
functional this early, before even the first line of the signon
message has been displayed.  Is it possible, given the loader
messages, to come up with a base address which could be used to
compare the eip value with the kernel symbol table?  Granted this
won't provide a traceback, but even knowing in which function it
crashed would be a start.

> BTW, are you sure that you correctly placed initialization of
> msgbufsize ?

I am not at all sure of that, and am not sufficiently familiar with
the sequence of events early in intiialization to know how to find
out -- although I suppose the observed crash might not be altogether
surprising if the kernel message buffer got allocated with a zero
size :(

Apart from the name, msgbufsize is set up in exactly the same
way and place -- in init_param1() -- as maxswzone and maxbcache.
Perhaps that is not early enough; any idea what would be a better
example?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: How to debug a double fault? (Re: Could MSGBUF_SIZE be made a loader tunable?)

2010-12-16 Thread Andriy Gapon
on 16/12/2010 11:34 per...@pluto.rain.com said the following:
> Andriy Gapon  wrote:
>> on 15/12/2010 12:37 per...@pluto.rain.com said the following:
>>> Fatal double fault:
>>> eip = 0xc07feb98
>>> esp = 0xc101e000
>>> ebp = 0xc101e004
>>> cpuid = 0; apic id = 00
>>> panic: double fault
>>> cpuid = 0
>>>
>>> How do I go about tracking this down?
>>
>> Do you have the standard debugging options in your kernel?
> 
> No, it is 8.1-RELEASE GENERIC with only the name changed and the
> (first attempt) msgbufsize patches applied.  I was trying to
> minimize changes to GENERIC, so as to minimize the opportunity
> to screw something up, and I had this silly idea that something
> this simple might "just work."
> 
> It does occur to me to wonder whether any debugger would be
> functional this early, before even the first line of the signon
> message has been displayed.  Is it possible, given the loader
> messages, to come up with a base address which could be used to
> compare the eip value with the kernel symbol table?  Granted this
> won't provide a traceback, but even knowing in which function it
> crashed would be a start.

You can research this approach, but I would just add KDB+DDB and get a stack
trace without sweat.

>> BTW, are you sure that you correctly placed initialization of
>> msgbufsize ?
> 
> I am not at all sure of that, and am not sufficiently familiar with
> the sequence of events early in intiialization to know how to find
> out -- although I suppose the observed crash might not be altogether
> surprising if the kernel message buffer got allocated with a zero
> size :(
> 
> Apart from the name, msgbufsize is set up in exactly the same
> way and place -- in init_param1() -- as maxswzone and maxbcache.
> Perhaps that is not early enough; any idea what would be a better
> example?

I don't see any connection between msgbufsize and maxswzone, so I also don't
know if that place is early enough.
Just try to initialize the variable where it's defined and use TUNABLE_LONG.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


New ZFSv28 patchset for 8-STABLE

2010-12-16 Thread Martin Matuska
Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:
   
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz

Link to mfsBSD ISO files for testing (i386 and amd64):
http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-amd64.iso
http://mfsbsd.vx.sk/iso/zfs-v28/8.2-beta-zfsv28-i386.iso

The root password for the ISO files: "mfsroot"
The ISO files work on real systems and in virtualbox.
They conatin a full install of FreeBSD 8.2-PRERELEASE with ZFS v28,
simply use the provided "zfsinstall" script.

The patch is against FreeBSD 8-STABLE as of 2010-12-15.

When applying the patch be sure to use correct options for patch(1)
and make sure the file sys/cddl/compat/opensolaris/sys/sysmacros.h gets
deleted:

# cd /usr/src
# fetch
http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
# xz -d stable-8-zfsv28-20101215.patch.xz
# patch -E -p0 < stable-8-zfsv28-20101215.patch
# rm sys/cddl/compat/opensolaris/sys/sysmacros.h

>From Pawel's announcement:

Some of the changes since the last patchset (zfs_20100831.patch):

- Boot support for ZFS v28 (only RAIDZ3 is not yet supported).
- Various fixes for the existing ZFS boot code.
- Support for sendfile(2) (by avg@).
- Userland<->kernel compatibility with v13-v15 (by mm@).
- ACL fixes (by trasz@).
- Various bug fixes.

Please test, test, test. Chances are this is the last patchset before
v28 going to HEAD (finally) and after a reasonable testing period into
8-STABLE.
Especially test new changes, like boot support and sendfile(2) support.
Also be sure to verify if you can import for existing ZFS pools
(v13-v15) when running v28 or boot from your existing pools.

Please test the (v13-v15) compatibility layer as well:
Old usereland + new kernel / old kernel + new userland

More information about ZFS on my blog:
http://blog.vx.sk


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Samba upgrade HowTo requested

2010-12-16 Thread Willy Offermans
Dear Samba friends,

Last weekend I decided to upgrade the samba server. We were running Samba
3.3 something and FreeBSD portupgrade was complaining that this version
should be removed and assumingly replaced by the newest version. I removed
the package via portupgrade and installed the 3.5.6 version. The upgrade
went quite smoothly in general, but I encountered some difficulties with
the printer drivers.

Before the upgrade we were able to print on 4 printers. After the upgrade
only 1.5 printer was working. 1 Printer worked as expected, 1 printer
printed only garbage and 2 printers were not working at all. I only managed
to solve the problems by de-installing and re-installing the printer
drivers on the samba server. So somehow the databases in
/var/db/samba/*.tdb have been messed up. I do not know what went wrong in
detail and neither do I know how to prevent these kind of issues in the
next upgrade.

What is the procedure to upgrade samba to the newest version? How should
one proceed and what are the pitfalls? How should we deal with the printer
definitions and printer drivers? What should we in general do with the
database files, next to backup?

And specifically for FreeBSD users: How should we deal with an upgrade of
samba via portupgrade?

-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,

Willy

*
W.K. Offermans
Home:   +31 45 544 49 44
Mobile: +31 681 15 87 68
e-mail: wi...@offermans.rompen.nl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: [Samba] Samba upgrade HowTo requested

2010-12-16 Thread Willy Offermans
Hello Peter,

On Thu, Dec 16, 2010 at 05:42:10PM +0300, Peter Trifonov wrote:
> Hi Willy,
> 
> > Last weekend I decided to upgrade the samba server. We were running
> > Samba
> > 3.3 something and FreeBSD portupgrade was complaining that this version
> > should be removed and assumingly replaced by the newest version. I
> > removed the package via portupgrade and installed the 3.5.6 version. The
> Are you running  winbindd on this server? If yes, does it work properly?
> In my case it failed to communicate group IDs to the system, so I had to
> rollback to v. 3.4.9.
> 
> > And specifically for FreeBSD users: How should we deal with an upgrade of
> samba via portupgrade?
> I have upgraded it many times before, and in most cases it was just make
> deinstall & make reinstall.  
> 
> 
> With best regards,
> P. Trifonov

Concerning your first question:

No, we are not running winbindd, so I cannot tell you if it might work.

To your second remark:

Well, it might be that it has worked in your case, but certainly not in
mine. I do not know what happened to the drivers or database of the drivers, but
something was really messed up. I like to clarify this and to put it on a
higher level. I like to figure out what the procedure is to follow and how
we can inform the users about this procedure.


-- 
Met vriendelijke groeten,
With kind regards,
Mit freundlichen Gruessen,

Willy

*
W.K. Offermans
Home:   +31 45 544 49 44
Mobile: +31 681 15 87 68
e-mail: wi...@offermans.rompen.nl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: ntpd fails on boot

2010-12-16 Thread Michael Voorhis
My high-tech solution to NTPDATE (et.al.) running before the link was up 
was to edit /etc/rc.d/NETWORKING and append these two lines at the 
bottom of the file:


==
/bin/echo "Waiting 10s for network link to wake up."
/bin/sleep 10
==

This has solved this startup problem in all the cases where it had 
previously been a problem.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: vm.swap_reserved toooooo large?

2010-12-16 Thread Oliver Fromme
George Mamalakis  wrote:
 > My dmesg shows:
 > 
 > pid 1732 (npviewer.bin), uid 1001: exited on signal 11 (core dumped)
 > pid 2227 (npviewer.bin), uid 1001: exited on signal 11 (core dumped)
 > swap zone exhausted, increase kern.maxswzone
 > pid 1544 (console-kit-daemon), uid 0, was killed: out of swap space
 > swap zone exhausted, increase kern.maxswzone
 > pid 2864 (memory), uid 1001, was killed: out of swap space
 > swap zone exhausted, increase kern.maxswzone
 > pid 1676 (gconf-helper), uid 1001, was killed: out of swap space
 > 
 > where one can see that pid 1544 was killed before 2864, which is the 
 > process that caused all this mess. Yes, I know that I should use limits 
 > so as not to allow such things to happen, but on the other hand, if a 
 > malicious user causes such a situation he/she may gain access to 
 > information through core-dumps on root processes, AND cause DoS attacks. 

No.  First, when the kernel kills processes because it runs
out of swap space, it uses SIGKILL which does _not_ cause
a core dump to be written.  Second, core dumps are always
created with permissions 0600, i.e. they are only readable
by the owner of the process.

Of course, any user who can run a machine out of memory can
cause a DoS attack by doing this.  That's the reason why
resource limits exist.

 > If it were for me, I would sort all processes based on their memory 
 > consumption, and start by killing those that have the highest value 
 > (top-bottom) that are NOT owned by root (just a thought, without 
 > thinking about it too much), so as to prevent such situations from 
 > happening.

It is very non-trivial to find a generic algorithm that
kills the "right" process in such a situation.  For example,
an attacker could start a lot of small processes that
allocate memory.  That's the reason why an admin should
always configure resource limits approprately.  The kernel's
killing feature should only be regarded as the very last
emergency break, which basically exists only to prevent a
reboot.

If you're interested, you can find the selection algorithm
for processes to kill in the vm_pageout_oom() function in
src/sys/vm/vm_pageout.c.  Basically, it selects the process
that consumes the most physical memory (RAM + swap), not
counting the virtual size of the process.  Also, some
processes are excluded, such as system processes and
protected processes (cron and sshd, for example).

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"If Java had true garbage collection, most programs
would delete themselves upon execution."
-- Robert Sewell
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: vm.swap_reserved toooooo large?

2010-12-16 Thread George Mamalakis

On 16/12/2010 18:56, Oliver Fromme wrote:

George Mamalakis  wrote:
  >  My dmesg shows:
  >
  >  pid 1732 (npviewer.bin), uid 1001: exited on signal 11 (core dumped)
  >  pid 2227 (npviewer.bin), uid 1001: exited on signal 11 (core dumped)
  >  swap zone exhausted, increase kern.maxswzone
  >  pid 1544 (console-kit-daemon), uid 0, was killed: out of swap space
  >  swap zone exhausted, increase kern.maxswzone
  >  pid 2864 (memory), uid 1001, was killed: out of swap space
  >  swap zone exhausted, increase kern.maxswzone
  >  pid 1676 (gconf-helper), uid 1001, was killed: out of swap space
  >
  >  where one can see that pid 1544 was killed before 2864, which is the
  >  process that caused all this mess. Yes, I know that I should use limits
  >  so as not to allow such things to happen, but on the other hand, if a
  >  malicious user causes such a situation he/she may gain access to
  >  information through core-dumps on root processes, AND cause DoS attacks.

No.  First, when the kernel kills processes because it runs
out of swap space, it uses SIGKILL which does _not_ cause
a core dump to be written.  Second, core dumps are always
created with permissions 0600, i.e. they are only readable
by the owner of the process.

Of course, any user who can run a machine out of memory can
cause a DoS attack by doing this.  That's the reason why
resource limits exist.

  >  If it were for me, I would sort all processes based on their memory
  >  consumption, and start by killing those that have the highest value
  >  (top-bottom) that are NOT owned by root (just a thought, without
  >  thinking about it too much), so as to prevent such situations from
  >  happening.

It is very non-trivial to find a generic algorithm that
kills the "right" process in such a situation.  For example,
an attacker could start a lot of small processes that
allocate memory.  That's the reason why an admin should
always configure resource limits approprately.  The kernel's
killing feature should only be regarded as the very last
emergency break, which basically exists only to prevent a
reboot.

If you're interested, you can find the selection algorithm
for processes to kill in the vm_pageout_oom() function in
src/sys/vm/vm_pageout.c.  Basically, it selects the process
that consumes the most physical memory (RAM + swap), not
counting the virtual size of the process.  Also, some
processes are excluded, such as system processes and
protected processes (cron and sshd, for example).

Best regards
Oliver

Oliver, thanx for your comments. I know it is difficult to choose which 
process to kill and how to be "fair" during such a killing procedure. 
Nevertheless, I would assume that all non-root processes would have 
higher priority to get killed, and that root's processes would get 
killed last. I understand your comments completely, but I was just so 
surprised when I realized how easy it was for me to kill root processes 
on my system.


Thanx again and best regards!

mamalos

--
George Mamalakis

IT Officer
Electrical and Computer Engineer (Aristotle Un. of Thessaloniki),
MSc (Imperial College of London)

Department of Electrical and Computer Engineering
Faculty of Engineering
Aristotle University of Thessaloniki

phone number : +30 (2310) 994379

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: aesni(?) corrupts data on 8.2-BETA1

2010-12-16 Thread jhell
On 12/12/2010 03:43, Kostik Belousov wrote:
> On Sat, Dec 11, 2010 at 07:37:51PM -0500, Mike Tancsa wrote:
>> On 12/11/2010 6:22 PM, Kostik Belousov wrote:
>>> On Sat, Dec 11, 2010 at 06:08:08PM -0500, Mike Tancsa wrote:
 On 12/11/2010 11:01 AM, Kostik Belousov wrote:
>
> I have no access to AESNI hardware. For start, you may use
> src/tools/tools/crypto/cryptotest
> to somewhat verify the sanity of the driver.

 I doesnt happen every time, but one out of 5 or so 

>>> First, which arch is it, amd64 or i386 ?
>>>
>>> Also, please revert r216162 and do the same tests.
>>
>> Hi,
>>  Its AMD64, but i386 seems to be impacted too. I am not sure how to
>> revert to a specific commit, but for now I csup'd with a date tag of
>>
>> *date=2010.12.02.23.00.00
>>
>> which is a day before
>> http://lists.freebsd.org/pipermail/svn-src-stable-8/2010-December/004338.html
>>
>>
>> And that seems to fix it!
>>
>> I  have been running
>> cryptotest -c -z -t 10
>> in a loop for the past 10min and not one error.
> 
> Please try this patch on the latest HEAD or RELENG_8.
> 
> diff --git a/sys/amd64/amd64/fpu.c b/sys/amd64/amd64/fpu.c
> index 482b5da..1b493b4 100644
> --- a/sys/amd64/amd64/fpu.c
> +++ b/sys/amd64/amd64/fpu.c
> @@ -426,7 +426,9 @@ fpudna(void)
>   fxrstor(&fpu_initialstate);
>   if (pcb->pcb_initial_fpucw != __INITIAL_FPUCW__)
>   fldcw(pcb->pcb_initial_fpucw);
> - fpuuserinited(curthread);
> + pcb->pcb_flags |= PCB_FPUINITDONE;
> + if (PCB_USER_FPU(pcb))
> + pcb->pcb_flags |= PCB_USERFPUINITDONE;
>   } else
>   fxrstor(pcb->pcb_save);
>   critical_exit();
> diff --git a/sys/i386/isa/npx.c b/sys/i386/isa/npx.c
> index 9ec5d25..f314e44 100644
> --- a/sys/i386/isa/npx.c
> +++ b/sys/i386/isa/npx.c
> @@ -684,7 +684,9 @@ npxdna(void)
>   fpurstor(&npx_initialstate);
>   if (pcb->pcb_initial_npxcw != __INITIAL_NPXCW__)
>   fldcw(pcb->pcb_initial_npxcw);
> - npxuserinited(curthread);
> + pcb->pcb_flags |= PCB_NPXINITDONE;
> + if (PCB_USER_FPU(pcb))
> + pcb->pcb_flags |= PCB_NPXUSERINITDONE;
>   } else {
>   /*
>* The following fpurstor() may cause an IRQ13 when the

Regarding this patch(r216455) and r216162 I have had to back both of
them out of my local tree to avoid panics on a ZFS & UFS2 i386 system.

With the following panic strings:
  Dumptime: Thu Dec  9 08:37:40 2010
  Panic String: double fault
  Dumptime: Thu Dec  9 08:41:57 2010
  Panic String: page fault
  Dumptime: Fri Dec 10 00:23:35 2010
  Panic String: free: address 0x85ceb000(0x85ceb000) has not been allocated.
  Dumptime: Fri Dec 10 14:37:33 2010
  Panic String: page fault
  Dumptime: Sat Dec 11 04:10:01 2010
  Panic String: vm_fault: fault on nofault entry, addr: 8289c000
  Dumptime: Sun Dec 12 23:45:01 2010
  Panic String: page fault
  Dumptime: Tue Dec 14 01:32:09 2010
  Panic String: page fault
  Dumptime: Tue Dec 14 16:46:33 2010
  Panic String: general protection fault
  Dumptime: Thu Dec 16 10:03:15 2010
  Panic String: vm_fault: fault on nofault entry, addr: b3811000


Seems to be caused by r216162 or directly related to it. If further
information is needed let me know. Ill be around here for the next few
hours.

-- 

 jhell,v
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: aesni(?) corrupts data on 8.2-BETA1

2010-12-16 Thread jhell
On 12/16/2010 20:10, jhell wrote:
> Regarding this patch(r216455) and r216162 I have had to back both of
> them out of my local tree to avoid panics on a ZFS & UFS2 i386 system.
> 
> With the following panic strings:
>   Dumptime: Thu Dec  9 08:37:40 2010
>   Panic String: double fault
>   Dumptime: Thu Dec  9 08:41:57 2010
>   Panic String: page fault
>   Dumptime: Fri Dec 10 00:23:35 2010
>   Panic String: free: address 0x85ceb000(0x85ceb000) has not been allocated.
>   Dumptime: Fri Dec 10 14:37:33 2010
>   Panic String: page fault
>   Dumptime: Sat Dec 11 04:10:01 2010
>   Panic String: vm_fault: fault on nofault entry, addr: 8289c000
>   Dumptime: Sun Dec 12 23:45:01 2010
>   Panic String: page fault
>   Dumptime: Tue Dec 14 01:32:09 2010
>   Panic String: page fault
>   Dumptime: Tue Dec 14 16:46:33 2010
>   Panic String: general protection fault
>   Dumptime: Thu Dec 16 10:03:15 2010
>   Panic String: vm_fault: fault on nofault entry, addr: b3811000
> 
> 
> Seems to be caused by r216162 or directly related to it. If further
> information is needed let me know. Ill be around here for the next few
> hours.
> 

PS: Also when the system crashes with the above panic strings
/boot/zfs/zpool.cache ends up corrupt leaving me with the need to boot
from a good kernel and force import the pool. Also scrubs with the two
revs end up with checksum errors all over the place. Without it
everything returns to normal.

-- 

 jhell,v
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"