from:"Eirik Øverby"

Random 'Connection reset' issues between jails on same host

2012-01-15 Thread Eirik Øverby

Hi all,

We're trying to implement our puppet infrastructure, and have discovered 
something strange about TCP connections between jails on the same host. As our 
jails haven't generally been doing a lot of connections between each other, 
this issue hasn't popped up before. 

We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 
8-core Intel systems, with 16GB RAM each. I have just upgraded one of the two 
systems to 9.0-RELEASE, and it shows the same problem.

When the puppetmaster jail is running on the same host as the jail running 
puppet agent, connections from the puppet agent randomly fails with 'Connection 
reset by peer'. This happens at random stages of configuration sync. Now if 
either of the jails are moved to another system (jail stop, zfs snaphot, zfs 
send/recv, jail start) on the same physical network, there are no such 
problems. It is not a hardware issue, as this happens no matter which of the 
two hosts we use. If both puppetmaster and puppet agent reside on the same 
physical box, the errors will show up.

There used to be a somewhat similar problem with FTP between jails on the same 
host, but this was taken care of some time after 8.0-RELEASE IIRC. That problem 
manifested itself in a combination of random connection failures (had to try 
2-3 times to establish a connection) and very slow transfer rates (at most 
150kbyte/s between jails on the same host, but >50mbyte/s between jails on 
different hosts on the same network).


Has anyone seen this before? Is there anything I have missed, sysctls I should 
set/adjust?

The /etc/rc.conf settings for the jails are very simple - the following 
differing from the default:
jail_sysvipc_allow="YES"
jail_mount_enable="YES"
jail_devfs_enable="YES"

/etc/sysctl.conf contains the following jail-related:
security.jail.enforce_statfs=0
security.jail.mount_allowed=1
security.jail.allow_raw_sockets=1


Thanks,
/Eirik___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Random 'Connection reset' issues between jails on same host

2012-01-15 Thread Eirik Øverby

On Jan 15, 2012, at 18:44, Eirik Øverby wrote:

> Hi all,
> 
> We're trying to implement our puppet infrastructure, and have discovered 
> something strange about TCP connections between jails on the same host. As 
> our jails haven't generally been doing a lot of connections between each 
> other, this issue hasn't popped up before. 
> 
> We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 
> 8-core Intel systems, with 16GB RAM each. I have just upgraded one of the two 
> systems to 9.0-RELEASE, and it shows the same problem.
> 
> When the puppetmaster jail is running on the same host as the jail running 
> puppet agent, connections from the puppet agent randomly fails with 
> 'Connection reset by peer'. This happens at random stages of configuration 
> sync. Now if either of the jails are moved to another system (jail stop, zfs 
> snaphot, zfs send/recv, jail start) on the same physical network, there are 
> no such problems. It is not a hardware issue, as this happens no matter which 
> of the two hosts we use. If both puppetmaster and puppet agent reside on the 
> same physical box, the errors will show up.

Replying to myself here:

Assignig a cpuset with a single CPU to the jail with puppetmaster seems to cure 
the symptom. I've made a few thousand connects now and no failures so far. 
Repeatable on 8 and 9. This is obviously only a workaround - but may give some 
hints as to where the problem is.

/Eirik

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Random 'Connection reset' issues between jails on same host

2012-01-15 Thread Eirik Øverby

Hi all,

We're trying to implement our puppet infrastructure, and have discovered 
something strange about TCP connections between jails on the same host. As our 
jails haven't generally been doing a lot of connections between each other, 
this issue hasn't popped up before. 

We have two 100% equal host systems, on FreeBSD 8.2-RELEASE-p4. These are 
8-core Intel systems, with 16GB RAM each.

When the puppetmaster jail is running on the same host as the jail running 
puppet agent, connections from the puppet agent randomly fails with 'Connection 
reset by peer'. This happens at random stages of configuration sync. Now if 
either of the jails are moved to another system (jail stop, zfs snaphot, zfs 
send/recv, jail start) on the same physical network, there are no such 
problems. It is not a hardware issue, as this happens no matter which of the 
two hosts we use. If both puppetmaster and puppet agent reside on the same 
physical box, the errors will show up.

There used to be a somewhat similar problem with FTP between jails on the same 
host, but this was taken care of some time after 8.0-RELEASE IIRC. That problem 
manifested itself in a combination of random connection failures (had to try 
2-3 times to establish a connection) and very slow transfer rates (at most 
150kbyte/s between jails on the same host, but >50mbyte/s between jails on 
different hosts on the same network).

I am going to try to repeat this on 9.0-RELEASE - but in the meantime, has 
anyone seen this before? Is there anything I have missed, sysctls I should 
set/adjust?

The /etc/rc.conf settings for the jails are very simple - the following 
differing from the default:
jail_sysvipc_allow="YES"
jail_mount_enable="YES"
jail_devfs_enable="YES"

/etc/sysctl.conf contains the following jail-related:
security.jail.enforce_statfs=0
security.jail.mount_allowed=1
security.jail.allow_raw_sockets=1


Thanks,
/Eirik___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Eirik Øverby

On 27. feb. 2010, at 20.38, Jeremy Chadwick wrote:

> On Sat, Feb 27, 2010 at 08:21:05PM +0100, Gerrit Kühn wrote:
>> On Sat, 27 Feb 2010 15:15:52 +0100 Willem Jan Withagen 
>> wrote about Re: mbuf leakage with nfs/zfs?:
>> 
>> WJW> > 81492/2613/84105 mbufs in use (current/cache/total)
>> WJW> > 80467/2235/82702/128000 mbuf clusters in use
>> WJW> > (current/cache/total/max) 80458/822 mbuf+clusters out of packet
>> WJW> > secondary zone in use (current/cache)
>> 
>> WJW> Over the night I only had rsync and FreeBSD nfs traffic.
>> WJW> 
>> WJW> 45337/2828/48165 mbufs in use (current/cache/total)
>> WJW> 44708/1902/46610/262144 mbuf clusters in use (current/cache/total/max)
>> WJW> 44040/888 mbuf+clusters out of packet secondary zone in use
>> WJW> (current/cache)
>> 
>> After about 24h I now have
>> 
>> 128320/2630/130950 mbufs in use (current/cache/total)
>> 127294/1200/128494/512000 mbuf clusters in use (current/cache/total/max)
>> 127294/834 mbuf+clusters out of packet secondary zone in use (current/cache)
> 
> Follow-up regarding my server statistics shown here:
> 
> http://lists.freebsd.org/pipermail/freebsd-stable/2010-February/055458.html
> 
> I just pulled the statistics on the same servers for comparison (then
> vs. now).
> 
> RELENG_7 amd64 2010/01/09 -- primary HTTP, pri DNS, SSH server + ZFS
> 
>   515/1930/2445 mbufs in use (current/cache/total)
>   512/540/1052/25600 mbuf clusters in use (current/cache/total/max)
>   1152K/6394K/7547K bytes allocated to network (current/cache/total)
> 
> RELENG_7 amd64 2010/01/11 -- secondary DNS, MySQL, dev box + ZFS
> 
>   514/1151/1665 mbufs in use (current/cache/total)
>   512/504/1016/25600 mbuf clusters in use (current/cache/total/max)
>   1152K/2203K/3356K bytes allocated to network (current/cache/total)
> 
> RELENG_7 i386 2008/04/19 -- secondary HTTP, SSH server, heavy memory I/O
> 
>   515/820/1335 mbufs in use (current/cache/total)
>   513/631/1144/25600 mbuf clusters in use (current/cache/total/max)
>   1154K/2615K/3769K bytes allocated to network (current/cache/total)
> 
> RELENG_8 amd64 2010/02/02 -- central backups + NFS+ZFS-based filer
> 
>   1572/3423/4995 mbufs in use (current/cache/total)
>   1539/3089/4628/25600 mbuf clusters in use (current/cache/total/max)
>   3471K/7449K/10920K bytes allocated to network (current/cache/total)
> 
> So, not much difference.
> 
> I should point out that the NFS+ZFS-based filer doesn't actually do its
> backups using NFS; it uses rsnapshot (rsync) over SSH.  There is intense
> network I/O during backup time though, depending on how much data there
> is to back up.  The NFS mounts (on the clients) are only used to provide
> a way for people to get access to their nightly backups in a convenient
> way; it isn't used very heavily.
> 
> I can do something NFS-intensive on any of the above clients if people
> want me to kind of testing.  Possibly an rsync with a source of the NFS
> mount and a destination of the local disk would be a good test?  Let me
> know if anyone's interested in me testing that.

I've had a discussion with some folks on this for a while. I can easily 
reproduce this situation by mounting a FreeBSD ZFS filesystem via NFS-UDP from 
an OpenBSD machine. Telling the OpenBSD machine to use TCP instead of UDP makes 
the problem go away.

Other FreeBSD systems mounting the same share, either using UDP or TCP, does 
not cause the problem to show up.

A patch was suggested by Rick Macklem, but that did not solve the issue:
http://lists.freebsd.org/pipermail/freebsd-current/2009-December/014181.html

/Eirik



> -- 
> | Jeremy Chadwick   j...@parodius.com |
> | Parodius Networking   http://www.parodius.com/ |
> | UNIX Systems Administrator  Mountain View, CA, USA |
> | Making life hard for others since 1977.  PGP: 4BD6C0CB |
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Eirik Øverby

On 27. feb. 2010, at 22.38, Gerrit Kühn wrote:

> On Sat, 27 Feb 2010 21:32:39 +0100 Eirik Øverby  wrote
> about Re: mbuf leakage with nfs/zfs?:
> 
> E> I've had a discussion with some folks on this for a while. I can easily
> E> reproduce this situation by mounting a FreeBSD ZFS filesystem via
> E> NFS-UDP from an OpenBSD machine. Telling the OpenBSD machine to use TCP
> E> instead of UDP makes the problem go away.
> 
> So we see this problem with udp clients from OpenBSD and Linux.

I have not had the opportunity to test with Linux or anything else. Could try 
from Windows, but not sure I want to get my hands THAT dirty.


> E> Other FreeBSD systems mounting the same share, either using UDP or TCP,
> E> does not cause the problem to show up.
> 
> As Daniel reported he saw the problem with FBSD 8-stable: Which version
> was the FBSD-client that worked for you with udp?

7.1, 7.2, 8.0-RCsomething and 8.0-RELEASE - no problems with either.


> E> A patch was suggested by Rick Macklem, but that did not solve the issue:
> E> 
> http://lists.freebsd.org/pipermail/freebsd-current/2009-December/014181.html
> 
> Yeah, I also found and tried this on Friday - unfortunately without any
> success, the leakage is still there.
> 
> 
> cu
>  Gerrit
> 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 7.0 RC1/SPARC64 panic in boot

2008-01-22 Thread Eirik Øverby


Hi list,

by disabling the isp driver (set hint.isp.o.disabled=1), the system  
comes up. This of course denies us access to the external disk array  
hosted by the internal QLogic controller, but pinpoints the problem.


We tried setting hint.isp.0.prefer_iomap=1, which made no difference  
(though by reading the code, I don't see that it ever used this).


Can anyone help us out here?

Thanks,
/Eirik

On Jan 21, 2008, at 11:23 AM, Anders Gulden Olstad wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

SUN Ultra 2 (2x400Mhz USII, 1500MB RAM)

Got the following panic during boot

 panic: trap: fast data access mmu miss
 cpuid = 0

This happened after upgrade from 6.2 -> 7.0 RC1. Tried to boot
from the CDROM as well, with same result


= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
==

Console log:

{0} ok boot cdrom
Boot device: /sbus/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:f  File 
and args:


FreeBSD/sparc64 boot block

  Boot path:   /[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL 
PROTECTED],0:f
  Boot loader: /boot/loader
Consoles: Open Firmware console

Booting with sun4u support.
Boot path set to /[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL 
PROTECTED],0:a

FreeBSD/sparc64 bootstrap loader, Revision 1.0
([EMAIL PROTECTED], Mon Dec 24 10:09:43 UTC 2007)
bootpath="/[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL 
PROTECTED],0:a"
Loading /boot/defaults/loader.conf
/boot/kernel/kernel data=0x6eee48+0x72c68  
syms=[0x8+0x76878+0x8+0x6663e]

\
Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [/boot/kernel/kernel]...
nothing to autoload yet.
jumping to kernel entry at 0xc007.
stray vector interrupt 2033
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,  
1994
   The Regents of the University of California. All rights  
reserved.

FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-RC1 #0: Tue Dec 25 02:17:08 UTC 2007
   [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
real memory  = 1610612736 (1536 MB)
avail memory = 1550393344 (1478 MB)
cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
RF5413, REGOPS_FUNC)
nexus0: 
sbus0:  mem 0x1fe-0x1fe7fff irq
2036,2037,2038,2021,2026,2039 on nexus0
sbus0: clock 25.000 MHz
sbus dvma: DVMA map: 0xfc00 to 0x
sbus0: [GIANT-LOCKED]
sbus0: [ITHREAD]
sbus0: [GIANT-LOCKED]
sbus0: [ITHREAD]
initializing counter-timer
Timecounter "counter-timer" frequency 100 Hz quality 100
auxio0:  mem 0x190 on sbus0
sbus0:  mem 0xc00-0xc0001ff irq 2020 type unknown (no
driver attached)
sbus0:  mem 0-0x7,0x138-0x13f type unknown (no
driver attached)
sbus0:  mem 0x140-0x147 irq 2025 type block (no
driver attached)
eeprom0:  mem 0x120-0x1201fff on sbus0
eeprom0: model mk48t59
scc0:  mem 0x110-0x113 irq  
2024 on

sbus0
scc0: [FILTER]
uart0:  on scc0
uart0: [FILTER]
uart0: console (9600,n,8,1)
uart1:  on scc0
uart1: [FILTER]
scc1:  mem 0x100-0x103 irq  
2024 on

sbus0
scc1: [FILTER]
uart2:  on scc1
uart2: [FILTER]
uart2: keyboard (1200,n,8,1)
uart2: keyboard not present
uart3:  on scc1
uart3: [FILTER]
sbus0:  mem 0x130-0x137 type unknown (no driver attached)
sbus0:  mem 0x1304000-0x1304002 type unknown (no driver  
attached)

esp0:  mem
0x880-0x88f,0x881-0x881003f irq 2016 on sbus0
esp0: [ITHREAD]
esp0: FAS366/HME, 40MHz, SCSI ID 7
hme0:  mem
0x8c0-0x8c00107,0x8c02000-0x8c03fff,0x8c04000-0x8c05fff, 
0x8c06000-0x8c07fff,0x8c07000-0x8c0701f

irq 2017 on sbus0
miibus0:  on hme0
nsphy0:  PHY 1 on miibus0
nsphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
hme0: Ethernet address: 08:00:20:91:d2:79
hme0: [ITHREAD]
sbus0:  mem 0xc80-0xc80001b irq 2018 type unknown (no
driver attached)
isp0 mem 0x1-0x1044f irq 2003 on sbus0
isp0: [ITHREAD]
panic: trap: fast data access mmu miss
cpuid = 0
Uptime: 1s
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
Resetting ...
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHlHKUMVyOPWVstbURAlkZAKC26W5268Q/+cJc6a3ImsqG8kvAIACfUFvP
mElTmJup2GOa5GCcVhOKXFs=
=7rUk
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To

Re: 7.0 RC1/SPARC64 panic in boot

2008-01-22 Thread Eirik Øverby


On Jan 22, 2008, at 7:23 PM, Marius Strobl wrote:


On Tue, Jan 22, 2008 at 07:16:16AM +0100, Eirik verby wrote:

Hi list,

by disabling the isp driver (set hint.isp.o.disabled=1), the system
comes up. This of course denies us access to the external disk array
hosted by the internal QLogic controller, but pinpoints the problem.

We tried setting hint.isp.0.prefer_iomap=1, which made no difference
(though by reading the code, I don't see that it ever used this).

Can anyone help us out here?


Scott, could this be due to a missing MFC of isp_sbus.c rev. 1.36?


If that would be the case I'd be most happy to hear that. I'll also be  
more than happy to test, and can do so on relatively short notice (at  
least for another few hours).


We have, for the record, gone through some basic troubleshooting:  
Replaced memory (as this error also can show up under Solaris and is  
usually an indicator of bad memory), replaced SCSI controller with  
another one (still isp driven), and testing various device hints -  
suffice to say we have wasted our time so far ;)


Holding breath...

/Eirik





Marius



On Jan 21, 2008, at 11:23 AM, Anders Gulden Olstad wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

SUN Ultra 2 (2x400Mhz USII, 1500MB RAM)

Got the following panic during boot

panic: trap: fast data access mmu miss
cpuid = 0

This happened after upgrade from 6.2 -> 7.0 RC1. Tried to boot
from the CDROM as well, with same result


=
=
=
=
=
=
=
=
=
=
= 
= 


Console log:

{0} ok boot cdrom
Boot device: /sbus/SUNW,[EMAIL PROTECTED],880/[EMAIL PROTECTED],0:f  File 
and args:


FreeBSD/sparc64 boot block

Boot path:   /[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL 
PROTECTED],0:f
Boot loader: /boot/loader
Consoles: Open Firmware console

Booting with sun4u support.
Boot path set to /[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL 
PROTECTED],0:a

FreeBSD/sparc64 bootstrap loader, Revision 1.0
([EMAIL PROTECTED], Mon Dec 24 10:09:43 UTC 2007)
bootpath="/[EMAIL PROTECTED],0/SUNW,[EMAIL PROTECTED],880/[EMAIL 
PROTECTED],0:a"
Loading /boot/defaults/loader.conf
/boot/kernel/kernel data=0x6eee48+0x72c68
syms=[0x8+0x76878+0x8+0x6663e]
\
Hit [Enter] to boot immediately, or any other key for command  
prompt.

Booting [/boot/kernel/kernel]...
nothing to autoload yet.
jumping to kernel entry at 0xc007.
stray vector interrupt 2033
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993,
1994
 The Regents of the University of California. All rights
reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-RC1 #0: Tue Dec 25 02:17:08 UTC 2007
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
real memory  = 1610612736 (1536 MB)
avail memory = 1550393344 (1478 MB)
cpu0: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
cpu1: Sun Microsystems UltraSparc-II Processor (400.00 MHz CPU)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
registered firmware set 
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,
RF5413, REGOPS_FUNC)
nexus0: 
sbus0:  mem 0x1fe-0x1fe7fff irq
2036,2037,2038,2021,2026,2039 on nexus0
sbus0: clock 25.000 MHz
sbus dvma: DVMA map: 0xfc00 to 0x
sbus0: [GIANT-LOCKED]
sbus0: [ITHREAD]
sbus0: [GIANT-LOCKED]
sbus0: [ITHREAD]
initializing counter-timer
Timecounter "counter-timer" frequency 100 Hz quality 100
auxio0:  mem 0x190 on sbus0
sbus0:  mem 0xc00-0xc0001ff irq 2020 type unknown  
(no

driver attached)
sbus0:  mem 0-0x7,0x138-0x13f type unknown  
(no

driver attached)
sbus0:  mem 0x140-0x147 irq 2025 type block (no
driver attached)
eeprom0:  mem 0x120-0x1201fff on sbus0
eeprom0: model mk48t59
scc0:  mem 0x110-0x113 irq
2024 on
sbus0
scc0: [FILTER]
uart0:  on scc0
uart0: [FILTER]
uart0: console (9600,n,8,1)
uart1:  on scc0
uart1: [FILTER]
scc1:  mem 0x100-0x103 irq
2024 on
sbus0
scc1: [FILTER]
uart2:  on scc1
uart2: [FILTER]
uart2: keyboard (1200,n,8,1)
uart2: keyboard not present
uart3:  on scc1
uart3: [FILTER]
sbus0:  mem 0x130-0x137 type unknown (no driver  
attached)

sbus0:  mem 0x1304000-0x1304002 type unknown (no driver
attached)
esp0:  mem
0x880-0x88f,0x881-0x881003f irq 2016 on sbus0
esp0: [ITHREAD]
esp0: FAS366/HME, 40MHz, SCSI ID 7
hme0:  mem
0x8c0-0x8c00107,0x8c02000-0x8c03fff,0x8c04000-0x8c05fff,
0x8c06000-0x8c07fff,0x8c07000-0x8c0701f
irq 2017 on sbus0
miibus0:  on hme0
nsphy0:  PHY 1 on miibus0
nsphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
hme0: Ethernet address: 08:00:20:91:d2:79
hme0: [ITHREAD]
sbus

Re: 7.0 RC1/SPARC64 panic in boot

2008-01-22 Thread Eirik Øverby


Will apply the patch and reboot in an hour or two.
The isp interface is only used for an external array, so we disable it  
and boot from internal drives on esp.

Thanks!
/Eirik

On Jan 23, 2008, at 7:32 AM, Scott Long wrote:


Eirik Øverby wrote:

On Jan 22, 2008, at 7:23 PM, Marius Strobl wrote:

On Tue, Jan 22, 2008 at 07:16:16AM +0100, Eirik verby wrote:

Hi list,

by disabling the isp driver (set hint.isp.o.disabled=1), the  
system  comes up. This of course denies us access to the external  
disk array  hosted by the internal QLogic controller, but  
pinpoints the problem.


We tried setting hint.isp.0.prefer_iomap=1, which made no  
difference  (though by reading the code, I don't see that it ever  
used this).


Can anyone help us out here?


Scott, could this be due to a missing MFC of isp_sbus.c rev. 1.36?
If that would be the case I'd be most happy to hear that. I'll also  
be more than happy to test, and can do so on relatively short  
notice (at least for another few hours).
We have, for the record, gone through some basic troubleshooting:  
Replaced memory (as this error also can show up under Solaris and  
is usually an indicator of bad memory), replaced SCSI controller  
with another one (still isp driven), and testing various device  
hints - suffice to say we have wasted our time so far ;)


Are you able to compile a new kernel without having to install first?
if so, apply the attached patch and let me know if it works.

Scott
Index: isp_sbus.c
===
RCS file: /usr1/ncvs/src/sys/dev/isp/isp_sbus.c,v
retrieving revision 1.35
retrieving revision 1.36
diff -u -r1.35 -r1.36
--- isp_sbus.c  11 May 2007 13:47:28 -  1.35
+++ isp_sbus.c  5 Nov 2007 11:22:18 -   1.36
@@ -29,7 +29,7 @@
 */

#include 
-__FBSDID("$FreeBSD: src/sys/dev/isp/isp_sbus.c,v 1.35 2007/05/11  
13:47:28 mjacob Exp $");
+__FBSDID("$FreeBSD: src/sys/dev/isp/isp_sbus.c,v 1.36 2007/11/05  
11:22:18 scottl Exp $");


#include 
#include 
@@ -327,21 +327,26 @@
/*
 * Make sure we're in reset state.
 */
+   ISP_LOCK(isp);
isp_reset(isp);
if (isp->isp_state != ISP_RESETSTATE) {
isp_uninit(isp);
+   ISP_UNLOCK(isp);
goto bad;
}
isp_init(isp);
	if (isp->isp_role != ISP_ROLE_NONE && isp->isp_state !=  
ISP_INITSTATE) {

isp_uninit(isp);
+   ISP_UNLOCK(isp);
goto bad;
}
isp_attach(isp);
	if (isp->isp_role != ISP_ROLE_NONE && isp->isp_state !=  
ISP_RUNSTATE) {

isp_uninit(isp);
+   ISP_UNLOCK(isp);
goto bad;
}
+   ISP_UNLOCK(isp);
return (0);

bad:


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 7.0 RC1/SPARC64 panic in boot [SOLVED]

2008-01-23 Thread Eirik Øverby


On Jan 23, 2008, at 7:32 AM, Scott Long wrote:


Eirik Øverby wrote:

On Jan 22, 2008, at 7:23 PM, Marius Strobl wrote:

On Tue, Jan 22, 2008 at 07:16:16AM +0100, Eirik verby wrote:

Hi list,

by disabling the isp driver (set hint.isp.o.disabled=1), the  
system  comes up. This of course denies us access to the external  
disk array  hosted by the internal QLogic controller, but  
pinpoints the problem.


We tried setting hint.isp.0.prefer_iomap=1, which made no  
difference  (though by reading the code, I don't see that it ever  
used this).


Can anyone help us out here?


Scott, could this be due to a missing MFC of isp_sbus.c rev. 1.36?
If that would be the case I'd be most happy to hear that. I'll also  
be more than happy to test, and can do so on relatively short  
notice (at least for another few hours).
We have, for the record, gone through some basic troubleshooting:  
Replaced memory (as this error also can show up under Solaris and  
is usually an indicator of bad memory), replaced SCSI controller  
with another one (still isp driven), and testing various device  
hints - suffice to say we have wasted our time so far ;)


Are you able to compile a new kernel without having to install first?
if so, apply the attached patch and let me know if it works.


Works very well, thanks a bunch!
Will this make it into 7-RELEASE?

/Eirik




Scott
Index: isp_sbus.c
===
RCS file: /usr1/ncvs/src/sys/dev/isp/isp_sbus.c,v
retrieving revision 1.35
retrieving revision 1.36
diff -u -r1.35 -r1.36
--- isp_sbus.c  11 May 2007 13:47:28 -  1.35
+++ isp_sbus.c  5 Nov 2007 11:22:18 -   1.36
@@ -29,7 +29,7 @@
 */

#include 
-__FBSDID("$FreeBSD: src/sys/dev/isp/isp_sbus.c,v 1.35 2007/05/11  
13:47:28 mjacob Exp $");
+__FBSDID("$FreeBSD: src/sys/dev/isp/isp_sbus.c,v 1.36 2007/11/05  
11:22:18 scottl Exp $");


#include 
#include 
@@ -327,21 +327,26 @@
/*
 * Make sure we're in reset state.
 */
+   ISP_LOCK(isp);
isp_reset(isp);
if (isp->isp_state != ISP_RESETSTATE) {
isp_uninit(isp);
+   ISP_UNLOCK(isp);
goto bad;
}
isp_init(isp);
	if (isp->isp_role != ISP_ROLE_NONE && isp->isp_state !=  
ISP_INITSTATE) {

isp_uninit(isp);
+   ISP_UNLOCK(isp);
goto bad;
}
isp_attach(isp);
	if (isp->isp_role != ISP_ROLE_NONE && isp->isp_state !=  
ISP_RUNSTATE) {

isp_uninit(isp);
+   ISP_UNLOCK(isp);
goto bad;
}
+   ISP_UNLOCK(isp);
return (0);

bad:
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-sparc64
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Highpoint drivers on 7.0

2008-01-25 Thread Eirik Øverby


Hi all,

did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7-RC1 or  
later? I'm considering upgrading one of my servers here, but I need to  
know if my RAID-controller will work after reinstall..


A shame HPT doesn't release the driver to the community...

Thanks,
/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Highpoint drivers on 7.0

2008-01-27 Thread Eirik Øverby


On Jan 25, 2008, at 11:32 PM, Steven Hartland wrote:

I would advise contacting them. There support was helpful when I  
last contacted
them and for the card that was involved the did release the code for  
the driver

when enabled us to fix the issues.


Actually, the new(?) hptrr driver seems to handle my 2220 just fine!  
Too bad it's still giant-locked..


/Eirik





  Regards
  Steve

- Original Message - From: "Alfred Perlstein" <[EMAIL PROTECTED] 
>




* Eirik ?verby <[EMAIL PROTECTED]> [080125 12:53] wrote:

Hi all,
did anyone try the Highpoint RocetRaid drivers (hptmv6.ko) on 7- 
RC1 or  later? I'm considering upgrading one of my servers here,  
but I need to  know if my RAID-controller will work after  
reinstall..

A shame HPT doesn't release the driver to the community...




This e.mail is private and confidential between Multiplay (UK) Ltd.  
and the person or entity to whom it is addressed. In the event of  
misdirection, the recipient is prohibited from using, copying,  
printing or otherwise disseminating it or any information contained  
in it.
In the event of misdirection, illegible or incomplete transmission  
please telephone +44 845 868 1337

or return the E.mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: /usr/bin/objformat is missing

2008-01-29 Thread Eirik Øverby

On Jan 29, 2008, at 4:49 PM, Chris H. wrote:

Quoting pluknet <[EMAIL PROTECTED]>:

On 29/01/2008, Chris H. <[EMAIL PROTECTED]> wrote:

Quoting Peter Jeremy <[EMAIL PROTECTED]>:

> On Mon, Jan 28, 2008 at 02:41:56PM -0800, Chris H. wrote:
>> In case you're wondering, objformat /is/ required - at leas for
>> www/apache13-ssl.
>

touching objformat is not a good way. Try this instead, last time it
helped me (taken from memory):

--- Makefile.orig   2008-01-29 13:38:43.0 +0300
+++ Makefile2008-01-29 13:41:19.0 +0300
@@ -5,7 +5,7 @@
#  and apache-ssl port by Mark Murray <[EMAIL PROTECTED] 
>.

#  Oh, and with a little bit of help from Ben :)
#
-# $FreeBSD: ports/www/apache13-ssl/Makefile,v 1.121 2007/06/17
16:59:26 anders Exp $
+# $FreeBSD$

PORTNAME=  apache+ssl
PORTVERSION=   ${APACHE_VERSION}.${APACHE_SSL_VERSION}
@@ -48,7 +48,7 @@

APACHE_HARD_SERVER_LIMIT?= 512

-CFLAGS+=   -I${OPENSSLINC}/openssl
+CFLAGS+=   -I${OPENSSLINC}/openssl -Wl,

I noticed this arg in another thread regarding this issue:
--export-dynamic

Thank you for posting this. Although I had success building and
running the apache13-ssl port after applying my objformat /hackery/.
I'm now running into troubles adding all of the php5 extensions I
need to use. I had no difficulties with php5 itself. But after a
certain point in the list, apache exits on signal 11 (core dumped).
Ermm... this was exactly the same trouble I started with, with the
exception that it was on signal 10.

I have had problems with PHP modules in the past; often they can end  
up crashing when loaded in the wrong order, for instance. I also had  
major trouble getting the imagick module to work at all lately.

Try re-ordering things in your extensions.ini, maybe commenting out  
all modules and re-enabling one at a time.

/Eirik

So, with any luck (fingers crossed), I'll get past this limitation
with your patch and /yet/ another make deinstall apache13-ssl &&
all-added-mod_whatevers && all-php5-extensions && php5. make install
everything-all-over-again. :/

Looks like the bugfest mark announced earlier isn't over just yet. :)

Thanks again for taking the time to respond and share your patch.

--Chris H

CONFIGURE_ARGS+=   \
  --prefix=${PREFIX} \
  --server-uid=www \
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"

--
panic: kernel trap (ignored)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Kernel panic on 7-PRERELEASE

2008-01-29 Thread Eirik Øverby


Hi,

Like on 6.x, I'm seeing frequent kernel panics when using my bge NICs.  
If I plug the cable into the fxp NIC all is fine. Dual opteron, Tyan  
K8S Pro (2882) board. I cannot see any pattern as to what is causing  
the panics, however I have obtained kernel dumps on a freshly built  
kernel (with -g), unfortunately without WITNESS or INVARIANTS. I've  
attached a screenshot of the KVM console after the crash.


Given that I have a kernel dump, what do I do to extract useful  
information, if at all possible?


Thanks!

/Eirik


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 4.8 -> 4.11 in-place upgrade ?

2008-01-30 Thread Eirik Øverby


On Jan 30, 2008, at 3:36 PM, Jeremy Chadwick wrote:


On Wed, Jan 30, 2008 at 09:01:40AM -0500, Robin Blanchard wrote:

I just inherited a remote 4.8 box...Having not used RELENG_4 in eons,
just wanted to check if it's safe to "live upgrade" (make
installworld/kernel ; mergemaster) directly to 4.11 (world/kernel
already built; waiting to install). /usr/src/UPDATING doesn't seem to
indicate this is out of the question.


It would be best for you to just schedule a time to update the box
entirely to RELENG_7, or at least RELENG_6.  This may take less time  
and

induce less pain than any oddities which might appear from a 4.8->4.11
upgrade.


FWIW; I did a 4.7->4.11 upgrade not too long ago, and didn't bump into  
any issues. YMMV of course. Whether or not it's worth it is another  
question entirely, and one you'll have to figure out for yourself :) I  
moved everything I could to 6-RCsomething when that was teh h0tness,  
and haven't looked back since (did some poking into 5-land as well,  
went back to 4.x badly burned).


/Eirik




--
| Jeremy Chadwickjdc at  
parodius.com |
| Parodius Networking   http://www.parodius.com/ 
 |
| UNIX Systems Administrator  Mountain View, CA,  
USA |
| Making life hard for others since 1977.  PGP:  
4BD6C0CB |


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

WITNESS weirdness

2008-01-30 Thread Eirik Øverby


Hi,

not sure if this is a problem, but:

# sysctl -a | grep witness
debug.witness.child_cnt: 161
debug.witness.child_free_cnt: 3935
debug.witness.sleep_cnt: 235
debug.witness.spin_cnt: 0
debug.witness.free_cnt: 789
debug.witness.skipspin: 1
debug.witness.trace: 1
debug.witness.kdb: 1
debug.witness.watch: 1

# sysctl debug.witness.watch=0
debug.witness.watch: 1 -> 0

# sysctl debug.witness.watch=1
debug.witness.watch: 0
sysctl: debug.witness.watch: Invalid argument

Am I supposed to be able to turn witness off runtime, but not back on  
again?


/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: WITNESS weirdness

2008-01-30 Thread Eirik Øverby


So I need to reboot. Brilliant :) And I thought I was being clever...

Using WITNESS to try and help figuring out why bge is crapping out on  
me all the time, but with WITNESS it's been stable, but oh-so-slow :P


/Eirik

On Jan 30, 2008, at 9:10 PM, Kris Kennaway wrote:


Eirik Øverby wrote:


# sysctl debug.witness.watch=1
debug.witness.watch: 0
sysctl: debug.witness.watch: Invalid argument
Am I supposed to be able to turn witness off runtime, but not back  
on again?


Yes, that is working as designed.  Witness needs to run continuously  
to track state.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

7.0, amd64: Wrong files installed into jails?

2008-02-11 Thread Eirik Øverby


Hi,

I've created some jails on FreeBSD 7-RC* now, and I realized there  
must be some kind of problem when I tried to install and run diablo- 
jdk 1.5 from the freebsdfoundation packages. It complains about


/libexec/ld-elf.so.1: /usr/local/lib/compat/pkg/libz.so.3: unsupported  
file layout


and file(1) returns

/usr/local/lib/compat/pkg/libz.so.3: ELF 32-bit LSB shared object,  
Intel 80386, version 1 (FreeBSD), stripped



On the host, which has been upgraded from 6.2 to 7.0-RC1 using cvsup++ 
+, Java runs just fine, and finds its libraries in /lib (for  
instance). Presumably because they are still left there:


ls -la /lib/libz.*
-r--r--r--  1 root  wheel   79824 Jun 16  2005 /lib/libz.so.2
-r--r--r--  1 root  wheel   81448 Apr 28  2007 /lib/libz.so.3
-r--r--r--  1 root  wheel   83648 Jan 28 09:02 /lib/libz.so.4

There are no compat6x-packages installed anywhere, and even installing  
the compat6x-amd64 package in the jail does not change anything.


Does installworld to a "clean" target install the i386 binaries  
instead of the amd64 binaries to the /usr/local/lib/compat/ tree??



With best regards,
/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: UFS snapshot weirdness

2008-02-12 Thread Eirik Øverby


On Feb 12, 2008, at 1:41 PM, Daniel O'Connor wrote:


On Tue, 12 Feb 2008, Eirik Øverby wrote:

I am at a total loss here. Is it re-using the first snapshot I ever
made of this filesystem, even though I've removed it? Didn't I
understand how to create/remove snapshots? Is this a bug?


Sure the old md isn't hanging around by mistake or some such?


Yes, I am absolutely sure of this.

I considered using the snapshot tool, however I need to reduce  
dependencies to an absolute minimum (as one target environment is very  
strict on allowing additional software installs)..


I use the snapshots to get a consistent file-backup with history. This  
one puzzles me to no end.


/Eirik


I have had people recover many files using the snapshot tool in ports
(plus a small symlink maker for samba access) and haven't noticed
issues like this.

On the otherhand I find it can take a long time to make a snapshot
(during which time no FS access is allowed).

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
 -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

UFS snapshot weirdness

2008-02-12 Thread Eirik Øverby


Hi all,

I've been making a wrapper script for the backup tool 'duplicity',  
allowing me to create config files for each resource, wherein I define  
whether a snapshot should be made prior to backing up the resource or  
not.


Now I find that my snapshots never change 

The script creates a snapshot, creates md device, mounts it, runs  
backup against the mounted snapshot, dismounts, removes md device, and  
rm -f's the snapshot file.


The problem is .. Whenever I look into the mounted snapshot, a given  
directory looks like so:


drwxr-xr-x   3 root  wheel   512 Jan 29 15:25 .
drwxr-xr-x  18 root  wheel   512 Jan 29 13:49 ..
-rw---   1 root  wheel  1281 Jan 31 17:12 .bash_history
-rw-r--r--   2 root  wheel   786 Jan 29 13:00 .cshrc
-rw-r--r--   1 root  wheel   143 Jan 29 13:00 .k5login
-rw-r--r--   1 root  wheel   293 Jan 29 13:00 .login
-rw-r--r--   2 root  wheel   253 Jan 29 13:00 .profile
drwxr-xr-x   2 root  wheel   512 Jan 29 13:00 .ssh

However, when looking into the same directory outside the snapshot, it  
looks like so:


-rw---   1 root  wheel   2961 Feb 12 00:39 .bash_history
-rw-r--r--   2 root  wheel786 Jan 29 13:00 .cshrc
-rw-r--r--   1 root  wheel143 Jan 29 13:00 .k5login
drwx--   2 root  wheel512 Feb 11 16:23 .links
-rw-r--r--   1 root  wheel293 Jan 29 13:00 .login
-rw-r--r--   2 root  wheel253 Jan 29 13:00 .profile
drwxr-xr-x   2 root  wheel512 Jan 29 13:00 .ssh
-rw-r--r--   1 root  wheel 948424 Feb 11 13:14 bsd-jdk16- 
patches-3.tar.bz2
-rw-r--r--   1 root  wheel   46938731 Feb 11 16:23 diablo-jdk- 
freebsd6.amd64.1.5.0.07.01.tbz
-rw-r--r--   1 root  wheel2116124 Feb 11 13:11 jdk-6u3-fcs-bin-b05- 
jrl-24_sep_2007.jar
-rw-r--r--   1 root  wheel8608204 Feb 11 13:11 jdk-6u3-fcs- 
mozilla_headers-b05-unix-24_sep_2007.jar
-rw-r--r--   1 root  wheel  116791442 Feb 11 13:15 jdk-6u3-fcs-src-b05- 
jrl-24_sep_2007.jar


The snapshot was made just now, long after those additional files were  
placed in the snapshot.


I am at a total loss here. Is it re-using the first snapshot I ever  
made of this filesystem, even though I've removed it? Didn't I  
understand how to create/remove snapshots? Is this a bug?


Any input is appreciated.

Thanks,
/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: UFS snapshot weirdness

2008-02-19 Thread Eirik Øverby


On Feb 13, 2008, at 9:21 AM, Daniel O'Connor wrote:


On Wed, 13 Feb 2008, Eirik Øverby wrote:

Yes, I am absolutely sure of this.

I considered using the snapshot tool, however I need to reduce
dependencies to an absolute minimum (as one target environment is
very strict on allowing additional software installs)..

I use the snapshots to get a consistent file-backup with history.
This one puzzles me to no end.


Hmm, that is very odd..
Maybe the FS is stuffed somehow :(


I read somewhere else about NFS issues on 7-RC* where snapshots have  
been used. In particular - and this is something I'm seeing too -  
changing the exports file or reloading mountd gives the following in  
messages log:


Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /tmp:  
Invalid argument
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /usr:  
Cross-device link
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /var:  
Cross-device link
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /export/ 
home: Cross-device link
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for /opt:  
Cross-device link


Can this be related? I'm starting to worry here - what will be the  
long-term consequences if snapshots are stuck around in this  
"invisible" state?


/Eirik





--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
 -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: UFS snapshot weirdness

2008-02-24 Thread Eirik Øverby


On Feb 23, 2008, at 4:46 PM, Guido Falsi wrote:


Eirik Øverby wrote:

I read somewhere else about NFS issues on 7-RC* where snapshots  
have been used. In particular - and this is something I'm seeing  
too - changing the exports file or reloading mountd gives the  
following in messages log:
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / 
tmp: Invalid argument
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / 
usr: Cross-device link
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / 
var: Cross-device link
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / 
export/home: Cross-device link
Feb 19 18:58:09 anduin mountd[38867]: can't delete exports for / 
opt: Cross-device link
Can this be related? I'm starting to worry here - what will be the  
long-term consequences if snapshots are stuck around in this  
"invisible" state?


Ok there is definitely something VERY fishy going on here.
I have just removed a lot of data from one of the partitions where I  
HAD snapshots (they are all gone now, since days). So freespace  
initially goes up a lot, as expected, then drops to around what it was  
before the deletion took place. There IS a snapshot being maintained  
somewhere, even though I have deleted it (using rm -f). What can I do,  
short of rebooting or remounting the filesystem??


This behavior is also seen on 6.2-RELEASE by the way; entirely  
different hardware (32bit vs 64bit, scsi vs ide, etc.)


/Eirik






I have been experiencing these too. But it looks more like a bug in  
mountd, since it shows up only is snapshots are created with mount.  
If snapshots are created with mksnap_ffs this does not seem to show  
up.


I still have to make more in depth experiments, but before  
experimenting by myself I'd like to have some more informed  
directions on what to experiment.


--
Guido Falsi <[EMAIL PROTECTED]>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED] 
"




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Upgrading to 7.0 - stupid requirements

2008-03-23 Thread Eirik Øverby


On Mar 23, 2008, at 08:28, Matthew Seaman wrote:


Freddie Cash wrote:


All that's really needed is a more formalised process for handling
upgrading config files, with as much as possible managed via the  
ports

framework itself.  Something that dictates the name of the config
file, and that compares the config file from the port against the
installed config file (or against an md5 of the port config file) and
only replaces it if it is unchanged.  Something that is part of the
make system.


Most ports that install configuration files actually do this already.
It's generally why you'll find that a sample configuration file is
considered part of the port, but the actuall live configuration file
is not.  The port will only feel free to meddle with the config file  
if

it is still identical to the sample file.


There are a few exceptions to this rule: The courier authdaemon ports,  
for instance, are notorious for overwriting my carefully-crafted  
configuration files when upgrading. I loathe those ports (or apps -  
not sure who's to blame) for that reason alone. In fact, it not only  
installs a config.dist file (which is fine), but it ALSO overwrites  
the current config. A cardinal sin, if there ever were any..


Now I must say I'm with the people who think that one should follow  
the one-port-one-configfile approach; however for a somewhat different  
reason: The closer a port sticks with the "default" configuration  
files, or samples if you will, of the software in question, the less  
FreeBSD-specific knowledge needs to be built to manage the port. If  
debian splits up the config into a forest of includefiles and  
symlinks, that might be good for a particular purpose, but it's  
something I'd prefer to do myself if the need is there. I've done  
similiar things on some occations, but that is, and IMO should be,  
"homebrew".


Also, making ports adhere to a much stricter configuration regime  
would make the uptake of new ports slow down considerably. I believe  
(though I have no numbers to back this up, so it is of course pure  
speculation) that the large number of ports available is at least  
partly due to the fact that making an initial port is relatively easy  
and straight forward.


Just my 2 cents.

/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Hardware - Sun workstation Ultra 20 and others

2008-04-12 Thread Eirik Øverby



On Apr 11, 2008, at 23:07, Peter Jeremy wrote:


On Fri, Apr 11, 2008 at 12:57:53PM +0200, Ivan Voras wrote:

Does anyone have experience with running Sun's Opteron-based
workstation, Ultra 20, 25, 40? Both with FreeBSD and other systems
(Linux)? Are they stable, all the drivers are present, etc?


I've not used any of these but:
1) The Ultra 25 is UltraSPARC IIIi based.  This CPU is not supported
  by FreeBSD as Sun will not release necessary documentation.


I thought this was resolved a while back? In any case, OpenBSD has had  
USiii support for some time now. I could get my hands on some USiii  
(and possibly IV) hardware to make available if.. ;)


/Eirik


2) Sun states they support both RHEL and SuSE ES on both the U20 and
  U40 so I would expect they are stable and all hardware supported,
  at least on those Linuxes.

--
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to  
implement
an MTA that is either RFC2821-compliant or matches their claimed  
behaviour.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

carpX: incorrect hash with IP aliases

2009-03-03 Thread Eirik Øverby


Hi,

whenever I configure an extra IP on one of my CARP interfaces, traffic  
on that particular subnet slows to a crawl (the primary IP of the  
interface is the gateway IP), and I get lots of

 carp4: incorrect hash
in dmesg.

I see this issue referenced also in
  http://lists.freebsd.org/pipermail/freebsd-net/2008-March/017160.html
and there are suggestions this is a known issue - however I still see  
it in FreeBSD 7.1 (pfSense 1.2.3-prerelease). I cannot find a PR on  
this, but my searching skills may be inadequate..


Am I doing something wrong? I tried assigning the alias with both /32  
and /24 netmasks.


/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: carpX: incorrect hash with IP aliases

2009-03-03 Thread Eirik Øverby


On Mar 3, 2009, at 19:23, Scott Ullrich wrote:


On Tue, Mar 3, 2009 at 12:52 PM, Max Laier  wrote:
[snip]
Make sure that you are configuring the same aliases with the same  
netmasks on
all members of the carp group - preferably before bringing the  
interface up
for the first time (though it should properly recalculate the  
hashes as you
add aliases).  As you seem to be using pfsense you might want to  
check with
them to make sure they have the fix in their build - though I  
recall it was a

joined effort back then.


1.2.3 is based on 7.1 so this patch should be in the base system now.


Excellent. And I just found that my second cluster member was not on  
1.2.3 ... I'm updating both to the latest snapshot and will be trying  
again. Thanks.


/Eirik



Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
"




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: carpX: incorrect hash with IP aliases

2009-03-04 Thread Eirik Øverby


On Mar 3, 2009, at 19:23, Scott Ullrich wrote:


On Tue, Mar 3, 2009 at 12:52 PM, Max Laier  wrote:
[snip]
Make sure that you are configuring the same aliases with the same  
netmasks on
all members of the carp group - preferably before bringing the  
interface up
for the first time (though it should properly recalculate the  
hashes as you
add aliases).  As you seem to be using pfsense you might want to  
check with
them to make sure they have the fix in their build - though I  
recall it was a

joined effort back then.


1.2.3 is based on 7.1 so this patch should be in the base system now.


Just tested, and this seems to work.
Now I just need to figure out how to make sure both carp nodes have  
the IPs added/removed at ~exactly the same time..


/Eirik



Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
"




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ugen and Gemplus SC reader

2009-03-20 Thread Eirik Øverby


Hi,

whenever I try to use openct/opensc to use my gemplus USB smartcard  
readers, I get the following in dmesg:


ugenioctl: USB_SET_SHORT_XFER, no pipe

The readers work fine on MacOS X and (reportedly) Linux, and the  
driver included in openct should support it. I can't find any PC/SC  
driver bundles for it though, only the serial readers.


Is this a problem in ugen?

With best regards,
/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Performance issues in 5.3-RELEASE.

2004-11-17 Thread Eirik Øverby

On Wed, 2004-11-17 at 10:40 +0100, Krzysztof Kowalik wrote:
> Hello,
> 
>Recently I took some time to upgrade my home 4.9 system to
> 5.3-RELEASE (fortunately, taking full system dump before, so I can
> easily get back). In fact just after upgrading I ran into the weird
> issue during installation for firefox port.
> 
> When firefox-1.0-source.tar.bz2 is getting untared, the system starts to
> be *slow*: any music starts to be jittered and the cursor in X stalls
> from time to time for ~1 second.

I have reported this on *several* occations over the last year or so, as
have others. However, apart from some time early this year/late last
year, it doesn't seem to have received due attention.

This is a serious problem for anyone using 5.3 on a desktop - which I
have very much legitimate reasons for doing - though I haven't really
noticed problems on servers due to this. It seems to be linked to the
system CPU load (not user CPU load), which would be logical.

Everyone seemed to think it was ULE related (and this showed up long
before PREEMPTION became an issue), but it is not...

Here's to hoping someone looks into it - I certainly ain't capable of
doing so ;)

/Eirik
PS: Apart from that I have to agree with what others have stated: 5.3 is
a truly wonderful release...


> And I never had this issue before with 4.x serie.
> 
> I tried to boot with an without ACPI, with GENERIC kernel, with my "own"
> kernel configuration (GENERIC with removed unused SCSI/RAID/NIC drivers)
> both with and without PREEMPTION[1]. Without any visible change in system's
> behaviour.
> 
> %uname -a
> FreeBSD bzzzt.borys.lan 5.3-RELEASE FreeBSD 5.3-RELEASE #2: Wed Nov 17
> 00:19:56 CET 2004 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/BZZZT  i386
> 
> # atacontrol list
> ATA channel 0:
> Master:  ad0  ATA/ATAPI revision 7
> Slave:   ad1  ATA/ATAPI revision 6
> ATA channel 1:
> Master: acd0  ATA/ATAPI revision 5
> Slave:  acd1  ATA/ATAPI revision 0
> ATA channel 2:
> Master:  ad4  ATA/ATAPI revision 5
> Slave:   ad5  ATA/ATAPI revision 5
> ATA channel 3:
> Master:  no device present
> Slave:   no device present
> 
> # atacontrol mode 0
> Master = UDMA100 
> Slave  = UDMA100
> # atacontrol mode 1
> Master = UDMA33 
> Slave  = UDMA33
> # atacontrol mode 2
> Master = UDMA100 
> Slave  = UDMA100
> 
> dmesgs from ACPI boot on "custom" kernel attached.
> 
> Is there anything I missed and therefore I should try/tune or any
> other informations that are needed and I missed them?
> 
> [1] yes, SCHED_4BSD
> 
> Regards,
> Krzysztof Kowalik
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"



___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

graid3 - requirements or manpage wrong?

2004-11-24 Thread Eirik Øverby

Hi,
to the best of my ability I have been investigating the 'real' 
requirements of a raid-3 array, and cannot see that the following text 
from graid3(8) cannot possibly be correct - and if it is, then the 
implementation must be wrong or incomplete (emphasis added):

label  Create a RAID3 device.  The last given component will contain
   parity data, all the rest - regular data.  ***Number of 
compo-
   nents has to be equal to 3, 5, 9, 17, etc. (2^n + 1).***

I might be wrong, but I cannot see how a raid-3 array should require 
(2^n + 1) drives - I am fairly certain I have seen raid-3 arrays 
consisting of four drives, for example. This is also what I had hoped to 
accomplish.

Anyone care to shed a light on this? I'd prefer to use graid3 (or 5, if 
there was one) instead of gvinum..

Thanks,
/Eirik
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: graid3 - requirements or manpage wrong?

2004-11-24 Thread Eirik Øverby

On 24. Nov 2004, at 18:11, Pawel Jakub Dawidek wrote:
On Wed, Nov 24, 2004 at 10:54:07AM +0100, Eirik ?verby wrote:
+> to the best of my ability I have been investigating the 'real' 
+> requirements of a raid-3 array, and cannot see that the following 
text 
+> from graid3(8) cannot possibly be correct - and if it is, then the 
+> implementation must be wrong or incomplete (emphasis added):
+> 
+> label      Create a RAID3 device.  The last given component will 
contain
+>                parity data, all the rest - regular data.  ***Number 
of 
+> compo-
+>                nents has to be equal to 3, 5, 9, 17, etc. (2^n + 
1).***
+> 
+> I might be wrong, but I cannot see how a raid-3 array should 
require 
+> (2^n + 1) drives - I am fairly certain I have seen raid-3 arrays 
+> consisting of four drives, for example. This is also what I had 
hoped to 
+> accomplish.

This requirement is because we want sectorsize to be power of 2
(UFS needs it).
In RAID3 we want to send every I/O request to all components at once,
that's why we need sector size to be N*512, where N is a power of 2 
value
AND because graid3 uses one parity component we need N+1 providers.
OK I see, makes sense. So it's not really a raid3 issue, but an 
implementation issue.
The only problem then is - gvinum being in a completely unusable state 
(for raid5 anyway), what are my alternatives? I have four 160gb IDE 
drives, and I want capacity+redundancy. Performance is a non-issue, 
really. What do I do - in software?

/Eirik

-- 
Pawel Jakub Dawidek                       http://www.FreeBSD.org
[EMAIL PROTECTED]                           http://garage.freebsd.pl
FreeBSD committer                         Am I Evil? Yes, I Am!
On 24. Nov 2004, at 18:11, Pawel Jakub Dawidek wrote:
On Wed, Nov 24, 2004 at 10:54:07AM +0100, Eirik ?verby wrote:
+> to the best of my ability I have been investigating the 'real' 
+> requirements of a raid-3 array, and cannot see that the following 
text 
+> from graid3(8) cannot possibly be correct - and if it is, then the 
+> implementation must be wrong or incomplete (emphasis added):
+> 
+> label      Create a RAID3 device.  The last given component will 
contain
+>                parity data, all the rest - regular data.  ***Number 
of 
+> compo-
+>                nents has to be equal to 3, 5, 9, 17, etc. (2^n + 
1).***
+> 
+> I might be wrong, but I cannot see how a raid-3 array should require 
+> (2^n + 1) drives - I am fairly certain I have seen raid-3 arrays 
+> consisting of four drives, for example. This is also what I had 
hoped to 
+> accomplish.

This requirement is because we want sectorsize to be power of 2
(UFS needs it).
In RAID3 we want to send every I/O request to all components at once,
that's why we need sector size to be N*512, where N is a power of 2 
value
AND because graid3 uses one parity component we need N+1 providers.

OK I see, makes sense. So it's not really a raid3 issue, but an 
implementation issue.
The only problem then is - gvinum being in a completely unusable state 
(for raid5 anyway), what are my alternatives? I have four 160gb IDE 
drives, and I want capacity+redundancy. Performance is a non-issue, 
really. What do I do - in software?

/Eirik

-- 
Pawel Jakub Dawidek                       http://www.FreeBSD.org
[EMAIL PROTECTED]                           http://garage.freebsd.pl
FreeBSD committer                         Am I Evil? Yes, I Am!

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

asr on amd64

2005-01-14 Thread Eirik Øverby

Hi!
Daring as I am, here's another attempt at having someone look into the
asr driver and why it doesn't work on amd64.
I have such a Zero-Channel RAID card laying around collecting dust,
whereas it was planned installed in a server here long time ago.
I know Scott Long looked into it long ago, and he seems to have been the
last one to touch the driver. He indicated a few months ago that he had
little time; perhaps things look brighter now that 5.3 is out?
I'll cross-post to -current in a few days if I don't hear anything..
Thanks,
/Eirik

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Panic: spin lock smp rendezvous ... held too long

2007-03-08 Thread Eirik Øverby


Hi all,

I just installed 6.2-RELEASE on a Supermicro 6013P-8 server, a dual  
P4-Xeon 2.4ghz with 4GB ECC memory and an asr driven SCSI RAID  
controller.


It has been working OK (although I suspect the asr driven, being  
giant-locked, is very inefficient) for a little while, but as I was  
extracting a bunch of tarballs it paniced like so:


spin lock smp rendezvous held by 0xc9d54600 for > 5 seconds
panic: spin lock held too long
cpuid = 0

I don't have a dump device (though I'm setting that up for the next  
reboot). However, I have tried turning off HT, to see if that might  
help.


Does this look familiar to anyone? Or do I need to produce more data  
if it happens again?


Thanks,
/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Panic: spin lock smp rendezvous ... held too long

2007-03-09 Thread Eirik Øverby



On Mar 9, 2007, at 03:41, Kris Kennaway wrote:


On Fri, Mar 09, 2007 at 12:44:03AM +0100, Eirik ?verby wrote:

Hi all,

I just installed 6.2-RELEASE on a Supermicro 6013P-8 server, a dual
P4-Xeon 2.4ghz with 4GB ECC memory and an asr driven SCSI RAID
controller.

It has been working OK (although I suspect the asr driven, being
giant-locked, is very inefficient) for a little while, but as I was
extracting a bunch of tarballs it paniced like so:

spin lock smp rendezvous held by 0xc9d54600 for > 5 seconds
panic: spin lock held too long
cpuid = 0

I don't have a dump device (though I'm setting that up for the next
reboot). However, I have tried turning off HT, to see if that might
help.

Does this look familiar to anyone? Or do I need to produce more data
if it happens again?


It can mean that something deadlocked.  Turning on WITNESS may help to
debug this, although it has a large performance impact.


I can't turn on WITNESS here "just like that", as I'll need some time  
to find a replacement server for some critical applications. However,  
the strange thing is that this server has been running solid as a  
rock (not one single crash) for 2 years with FreeBSD 4.x on it, so I  
am fairly sure there is no hardware issue.


It crashed today, and I have obtained a dump. I am running 6.2- 
RELEASE with the stock SMP kernel, and haven't recompiled yet, so I  
can't seem to find a kernel.debug, but I'm building one now with the  
6.2-RELEASE sources, as supplied on the CD. I'm assuming this will  
give me a useable kernel.debug.


Anything in particular I should look for if/when I'm able to peek  
into the dump with kgdb?


thanks,
/Eirik



Kris




PGP.sig
Description: This is a digitally signed message part

Re: Panic: spin lock smp rendezvous ... held too long

2007-03-09 Thread Eirik Øverby


On Mar 9, 2007, at 03:41, Kris Kennaway wrote:


On Fri, Mar 09, 2007 at 12:44:03AM +0100, Eirik ?verby wrote:

Hi all,

I just installed 6.2-RELEASE on a Supermicro 6013P-8 server, a dual
P4-Xeon 2.4ghz with 4GB ECC memory and an asr driven SCSI RAID
controller.

It has been working OK (although I suspect the asr driven, being
giant-locked, is very inefficient) for a little while, but as I was
extracting a bunch of tarballs it paniced like so:

spin lock smp rendezvous held by 0xc9d54600 for > 5 seconds
panic: spin lock held too long
cpuid = 0

I don't have a dump device (though I'm setting that up for the next
reboot). However, I have tried turning off HT, to see if that might
help.

Does this look familiar to anyone? Or do I need to produce more data
if it happens again?


It can mean that something deadlocked.  Turning on WITNESS may help to
debug this, although it has a large performance impact.


Just opened the vmcore file, and this is what I see, with a bt at the  
end:


Unread portion of the kernel message buffer:
dev = da0s1f, block = 3802920, fs = /usr
panic: ffs_blkfree: freeing free block
cpuid = 1
Uptime: 23h59m46s
Dumping 3967 MB (3 chunks)
  chunk 0: 1MB (158 pages) ... ok
  chunk 1: 3966MB (1015280 pages) 3950 3934 3918 3902 3886 3870 3854  
3838 3822 3806 3790 3774 3758 3742 3726 3710 3694 3678 3662 3646 3630  
3614 3598 3582


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 06
fault virtual address   = 0x18c
fault code  = supervisor read, page not present
instruction pointer = 0x20:0xc04542f4
stack pointer   = 0x28:0xe98a3c88
frame pointer   = 0x28:0xe98a3c90
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 18 (swi2: cambio)
trap number = 12
panic: page fault
cpuid = 1
3566 3550 3534 3518 3502 3486 3470 3454 3438 3422 3406 3390 3374 3358  
3342 3326 3310 3294 3278 3262 3246 3230 3214 3198 3182 3166 3150 3134  
3118 3102 3086 3070 3054 3038 3022 3006 2990 2974 2958 2942 2926 2910  
2894 2878 2862 2846 2830 2814 2798 2782 2766 2750 2734 2718 2702 2686  
2670 2654 2638 2622 2606 2590 2574 2558 2542 2526 2510 2494 2478 2462  
2446 2430 2414 2398 2382 2366 2350 2334 2318 2302 2286 2270 2254 2238  
 2206 2190 2174 2158 2142 2126 2110 2094 2078 2062 2046 2030 2014  
1998 1982 1966 1950 1934 1918 1902 1886 1870 1854 1838 1822 1806 1790  
1774 1758 1742 1726 1710 1694 1678 1662 1646 1630 1614 1598 1582 1566  
1550 1534 1518 1502 1486 1470 1454 1438 1422 1406 1390 1374 1358 1342  
1326 1310 1294 1278 1262 1246 1230 1214 1198 1182 1166 1150 1134 1118  
1102 1086 1070 1054 1038 1022 1006 990 974 958 942 926 910 894 878  
862 846 830 814 798 782 766 750 734 718 702 686 670 654 638 622 606  
590 574 558 542 526 510 494 478 462 446 430 414 398 382 366 350 334  
318 302 286 270 254 238 222 206 190 174 158 142 126 110 94 78 62 46  
30 14 ... ok

  chunk 2: 1MB (128 pages)

#0  doadump () at pcpu.h:165
165 __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc067550a in boot (howto=260) at /usr/src/sys/kern/ 
kern_shutdown.c:409
#2  0xc0675831 in panic (fmt=0xc0911dc2 "ffs_blkfree: freeing free  
block") at /usr/src/sys/kern/kern_shutdown.c:565
#3  0xc07b375e in ffs_blkfree (ump=0xc9607c00, fs=0xc93c5800,  
devvp=0xc9625110, bno=3802920, size=16384, inum=39400)

at /usr/src/sys/ufs/ffs/ffs_alloc.c:1869
#4  0xc07c38c6 in indir_trunc (freeblks=0xcb2bfa00, dbn=15210848,  
level=0, lbn=12, countp=0xe98b8c6c)

at /usr/src/sys/ufs/ffs/ffs_softdep.c:2894
#5  0xc07c32f6 in handle_workitem_freeblocks (freeblks=0xcb2bfa00,  
flags=0) at /usr/src/sys/ufs/ffs/ffs_softdep.c:2744
#6  0xc07c01b1 in process_worklist_item (mp=0xc95d87c8, flags=0) at / 
usr/src/sys/ufs/ffs/ffs_softdep.c:967
#7  0xc07bfeb2 in softdep_process_worklist (mp=0xc95d87c8, full=0)  
at /usr/src/sys/ufs/ffs/ffs_softdep.c:851
#8  0xc07bfc08 in softdep_flush () at /usr/src/sys/ufs/ffs/ 
ffs_softdep.c:762
#9  0xc065ec4d in fork_exit (callout=0xc07bfa6c ,  
arg=0x0, frame=0xe98b8d38) at /usr/src/sys/kern/kern_fork.c:821
#10 0xc0879dac in fork_trampoline () at /usr/src/sys/i386/i386/ 
exception.s:208






PGP.sig
Description: This is a digitally signed message part

Weird messages output

2007-03-27 Thread Eirik Øverby


Hi all,

running 6.1-RELEASE on several HP DL385 servers (identically  
configured), one of them has recently spat the following out in the / 
var/log/messages file:


..
Mar 10 03:51:24 apphost02 ntpd[445]: kernel time sync enabled 2001
Mar 10 05:02:01 apphost02 kernel: NMI ISA 30, EISA ff
Mar 10 05:02:01 apphost02 kernel: k
Mar 10 05:02:01 apphost02 kernel: NMIN MIIe SIASA  202r,0 ,E IESIAS A  
ffnf

Mar 10 05:02:01 apphost02 kernel: f
Mar 10 05:02:01 apphost02 kernel:
Mar 10 05:02:01 apphost02 kernel: el trap 19 with interrupts disabled
Mar 10 05:02:01 apphost02 kernel: NMI ISA 20, EISA ff
Mar 10 06:08:01 apphost02 ntpd[445]: kernel time sync enabled 6001
..

NMI = non-maskable interrupt, if I remember correctly. However, I  
have no idea what this means or why it appeared. The status light on  
the front of the server has lit up red, as opposed to the usual  
green. All services on the host are running and behaving normally  
from what I can tell.


Any input, anyone?

Thanks,
/Eirik

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Weird messages output

2007-03-27 Thread Eirik Øverby



On 27. mar. 2007, at 15.33, Gavin Atkinson wrote:


On Tue, 2007-03-27 at 15:00 +0200, Eirik Øverby wrote:

Hi all,

running 6.1-RELEASE on several HP DL385 servers (identically
configured), one of them has recently spat the following out in the /
var/log/messages file:

..
Mar 10 03:51:24 apphost02 ntpd[445]: kernel time sync enabled 2001
Mar 10 05:02:01 apphost02 kernel: NMI ISA 30, EISA ff
Mar 10 05:02:01 apphost02 kernel: k
Mar 10 05:02:01 apphost02 kernel: NMIN MIIe SIASA  202r,0 ,E IESIAS A
ffnf
Mar 10 05:02:01 apphost02 kernel: f
Mar 10 05:02:01 apphost02 kernel:
Mar 10 05:02:01 apphost02 kernel: el trap 19 with interrupts disabled
Mar 10 05:02:01 apphost02 kernel: NMI ISA 20, EISA ff
Mar 10 06:08:01 apphost02 ntpd[445]: kernel time sync enabled 6001
..

NMI = non-maskable interrupt, if I remember correctly. However, I
have no idea what this means or why it appeared. The status light on
the front of the server has lit up red, as opposed to the usual
green. All services on the host are running and behaving normally
from what I can tell.


I suspect you'll find your (ECC) memory has problems.


You are absolutely correct. Further investigation using the ProLiant  
management tools for FreeBSD revealed serious RAM trouble. Two banks  
were degraded, so we have now had the modules replaced on-site.


Thanks for the tip!
Do you happen to know if there are any "generic" tools/daemons  
available to decipher such NMIs? Perhaps be able to send SNMP traps  
or something?


/Eirik



Gavin



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Panic: sleeping thread

2006-06-17 Thread Eirik Øverby


Hi,

ever since 6.1-RELEASE (possibly earlier, not sure) I've been seeing  
frequent panics on a previously stable (6.0-STABLE) dual opteron  
server. When I say "previously stable" I mean weeks and months of  
uptime, and no known non-intended reboots.
Now I'm seeing panics on a semi-regular basis, up to 2-3 times per  
week. The panic goes (dmesg, bt and ps at the end of this message):


Sleeping thread (tid 100082, pid 84236) owns a non-sleepable lock
panic: sleeping thread
cpuid = 0
KDB: enter: panic
[thread pid 84235 tid 100474 ]
Stopped at  kdb_enter+0x2f: nop

...where pid 84236 is a sh instance. I cannot reproduce this on  
demand, but I usually only have to wait a few days (it'll happen, at  
the latest, whenever I think it'll survive another evening and go out  
for a beer...). Calling boot() or reset from db> just causes the box  
to hang, I have to power cycle it at this point.


I do not have a dump device (no swap partitions large enough, known  
problem, more hardware coming), but I hope the attached information  
helps.


With best regards,
/Eirik


dmesg:

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.1-STABLE #2: Wed May 31 20:13:06 CEST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/ANDUIN
WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant
WARNING: MPSAFE network stack disabled, expect reduced performance.
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Opteron(tm) Processor 242 (1595.14-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0xf5a  Stepping = 10
   
Features=0x78bfbffMCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>

  AMD Features=0xe0500800
real memory  = 2147418112 (2047 MB)
avail memory = 2061500416 (1966 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-27 on motherboard
ioapic2  irqs 28-31 on motherboard
kbd1 at kbdmux0
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x5008-0x500b on acpi0
cpu0:  on acpi0
acpi_throttle0:  on cpu0
cpu1:  on acpi0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pcib1:  at device 6.0 on pci0
pci3:  on pcib1
ohci0:  mem 0xfeafc000-0xfeafcfff irq  
19 at device 0.0 on pci3

ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0:  on ohci0
usb0: USB revision 1.0
uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1:  mem 0xfeafd000-0xfeafdfff irq  
19 at device 0.1 on pci3

ohci1: [GIANT-LOCKED]
usb1: OHCI version 1.0, legacy support
usb1:  on ohci1
usb1: USB revision 1.0
uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
atapci0:  port  
0xb400-0xb407,0xb000-0xb003,0xac00-0xac07,0xa800-0xa803,0xa400-0xa40f  
mem 0xfeafec00-0xfeafefff irq 19 at device 5.0 on pci3

ata2:  on atapci0
ata3:  on atapci0
ata4:  on atapci0
ata5:  on atapci0
pci3:  at device 6.0 (no driver attached)
fxp0:  port 0xbc00-0xbc3f mem  
0xfeafb000-0xfeafbfff,0xfeaa-0xfeab irq 18 at device 8.0 on pci3

miibus0:  on fxp0
inphy0:  on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:e0:81:2a:11:64
fxp0: [GIANT-LOCKED]
isab0:  at device 7.0 on pci0
isa0:  on isab0
atapci1:  port  
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0

ata0:  on atapci1
ata1:  on atapci1
pci0:  at device 7.2 (no driver attached)
pci0:  at device 7.3 (no driver attached)
pcib2:  at device 10.0 on pci0
pci2:  on pcib2
ahd0:  port 0x9000-0x90ff, 
0x9c00-0x9cff mem 0xfc8fc000-0xfc8fdfff irq 24 at device 6.0 on pci2

ahd0: [GIANT-LOCKED]
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
ahd1:  port 0x9800-0x98ff, 
0x9400-0x94ff mem 0xfc8fe000-0xfc8f irq 25 at device 6.1 on pci2

ahd1: [GIANT-LOCKED]
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs
bge0:  mem  
0xfc8b-0xfc8b,0xfc8a-0xfc8a irq 24 at device 9.0 on pci2

miibus1:  on bge0
brgphy0:  on miibus1
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,  
1000baseTX-FDX, auto

bge0: Ethernet address: 00:e0:81:2a:59:8c
bge0: [GIANT-LOCKED]
bge1:  mem  
0xfc8e-0xfc8e,0xfc8d-0xfc8d irq 25 at device 9.1 on pci2

miibus2:  on bge1
brgphy1:  on miibus2
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,  
1000baseTX-FDX, auto

bge1: Ethernet address: 00:e0:81:2a:59:8d
bge1: [GIANT-LOCKED]
pci0:  at device 10.1 (no  
driver attached)

pcib3:  at device 11.0 on pci0
pci1:  on pcib3
pci0:  at device 11.1 (no  
driver attached)

acpi_button0:  on acpi0
atkbdc0:  port 0x60,0x64 irq 1 on

gmirror oddities

2005-05-03 Thread Eirik Øverby

Hi!

I've been using gmirror for a while to safeguard my system disks. I have
taken the slice-based mirror approach, where I use, say, ad0s1 and ad2s1 as
providers.
On one of my servers, this seems to be impossible. I create the mirror using
ad2s1 first (to keep my system running while I do some of the work), and
then I re-initialize ad0s1 (making it exactly the size of ad2s1) before
using gmirror insert to add it to the mirror.
However, at this point - when doing a gmirror list - it turns out that it
never added ad0s1 as a provider, but ad0 itself! As a result, I now have a
load of slices (ad0a, ad0b, ad0d, ad0e, ad0f) instead of having the same
structure as I have on ad2s1. It's just like ad2s1, just without the "s1"
part.

I've tried "dd if=/dev/zero of=/dev/ad0 bs=65536" a couple of times, in case
some old provider metadata was stored there. I also have exactly the same
setup in another server, the only difference being that it behaves as
expected..

Am I doing something blatantly wrong here? This IS supposed to work, right?
I've even found a very nice description of how to do it at
http://people.freebsd.org/~rse/mirror/
confirming that what I'm doing is right.

I'm on 5.4-PRERELEASE, but this problem has been there since 5.3-p2 or
something, which was when I first tried this.

Anyone?

Thanks,
/Eirik


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Current status of nullfs and/or unionfs?

2005-05-05 Thread Eirik Øverby

Hi all,

I'm struggling with some hosting environments where I am managing a large
number of jails (>100) spread over about a dozen servers. I am starting to
see disk space as a real problem, especially given that each physical box
needs to be autonomous - i.e. I can't rely on any external storage, and I am
limited to 1U and 2U servers.

The solution, or at least parts of it, would be to have certain parts of the
jail filesystems mounted in via nullfs (acceptable solution) or unionfs
(ideal solution). However, ever since FreeBSD 4.10 this has been a major
problem, as both filesystems started exhibiting major stability and data
integrity issues.

Before I start playing with this again, I'd like to know if any work has
been done on either of these in 5.x. Specifically, I'm currently running
5.3-p6 or newer on all the systems, and as of yesterday I've been using
5.4-prerelease (cvsup) on a couple of test systems.

What can I expect to see when trying nullfs and/or unionfs today? Has
anything changed? Do I have even a remote chance of making it work - and if
it doesn't work, what are my chances of anyone having time or energy to look
into it? I'm an admin only, no coder, otherwise I'd be happy to look into it
myself.

Thanks,
/Eirik


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Current status of nullfs and/or unionfs?

2005-05-06 Thread Eirik Øverby

On 06-05-05 09:25, "Danny Braniss" <[EMAIL PROTECTED]> wrote:

> 
>> Interesting approach. Is this with 4.x or 5.x? How do you union-mount /etc
>> (mount command/fstab entry)?
>> 
> 
> been doing it since 4.x (i think x < 9)

Any idea how unionfs will behave if stacked (more mounts on top of each
other)? I was playing with the thought of having a "template" jail directory
which I unionmount into my jails, then perhaps use your trick to union-mount
a md device into certain points in the jail. Got a gut feeling about that?

/Eirik

> in initdiskless (5.x) we have:
> 
> if [ -e /conf/union ]; then
> kldload unionfs
> mount_md 4096 /conf/etc
> chmod 755 /conf/etc
> mount_unionfs /conf/etc /etc
> ls -R /etc > /dev/null
> touch /etc/.sentinel
> md_created_etc=created
> fi
> 
> danny
> 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Current status of nullfs and/or unionfs?

2005-05-06 Thread Eirik Øverby

On 06-05-05 13:14, "Danny Braniss" <[EMAIL PROTECTED]> wrote:

>> On 06-05-05 09:25, "Danny Braniss" <[EMAIL PROTECTED]> wrote:
>> 
>>> 
 Interesting approach. Is this with 4.x or 5.x? How do you union-mount /etc
 (mount command/fstab entry)?
 
>>> 
>>> been doing it since 4.x (i think x < 9)
>> 
>> Any idea how unionfs will behave if stacked (more mounts on top of each
>> other)? I was playing with the thought of having a "template" jail directory
>> which I unionmount into my jails, then perhaps use your trick to union-mount
>> a md device into certain points in the jail. Got a gut feeling about that?
> 
> i have the feeling that that will get into trouble :-), but im no expert
> here. If what you mean is:
> 
> mount_unionfs /md-0 /jail-0
> and then 
> mount_unionfs /md-1 /jail-0/xyz
> 
> which is not strickly 'stacked', might work and should be easy to try out, but
> IMHO, breaks the KISS principle :-)

I was more thinking, like,
mount_unionfs -b /jails/jail_template /jails/jail-0
mount_unionfs /md-0 /jails/jail-0/etc
for example.

I could also imagine stacking unionfs on top of nullfs, like
mount_nullfs /cdrom/jail_template /jails/jail-0
mount_unionfs /md-0 /jails/jail-0
alternatively
mount_unionfs /nfs-0 /jails/jail-0

Sounds weird, I know, but we could use it...

 
> and also, im not sure if:
> mkdir /jail-0/xyz
> mount_unionfs /md-1 /jail-0/xyz
> is the same as the above.
> 
> danny
> 
> 
> 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

unionfs limitations?

2005-05-06 Thread Eirik Øverby

Hi,

I just started playing with mounting ports into jails using unionfs
(mount_unionfs -b /usr/ports_jail /usr/local/jails/jail-0/usr/ports), and
many things seem to work fine.
However, when trying to install either of mysql41-server or mysql41-client,
I see the following:

[EMAIL PROTECTED] /usr/ports/databases/mysql41-server# make install
===>  Installing for mysql-server-4.1.11_1
===>   mysql-server-4.1.11_1 depends on shared library: mysqlclient.14 -
found
===>   Generating temporary packing list
===>  Checking if databases/mysql41-server already installed
ln: POSIX: Operation not supported
*** Error code 1

Stop in /usr/ports/databases/mysql41-server.

Did I miss out on something, or is this not going to work? Do I need to
think in other ways?
I have stress-tested this setup pretty well over the last 24 hours, with as
many as 20 mountpoints using the same ports tree, with constant package
building in each of them. This was impossible last time I played with
unionfs, so it must have stabilized somewhat ;)

Anyone?

/Eirik


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

5.4-panic

2005-05-28 Thread Eirik Øverby


Hi folks,

I have sinned, I have forgotten to configure a dump device. I do have  
a debug kernel compiled though (I think), so maybe someone can help  
me figure out what's happening here. Nothing in particular going on,  
server has been up for a few weeks. Dual opteron machine, running  
FreeBSD-amd64.


Info below (uname -a, panic info and dmesg).

/Eirik

Version info:
FreeBSD anduin.net 5.4-STABLE FreeBSD 5.4-STABLE #0: Tue May  3  
11:19:51 CEST 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ANDUIN   
amd64


PANIC INFO:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x88
fault code  = supervisor read, page not present
instruction pointer = 0x8:0x803cd9e9
stack pointer   = 0x10:0xa54f5a20
frame pointer   = 0x10:0xa54f5a50
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 62 (pagedaemon)
[thread pid 62 tid 100049 ]
Stopped at  thread_fini+0x89:   subl0x88(%ebx),%eax
db> where
Tracing pid 62 tid 100049 td 0xff003dab0280
thread_fini() at thread_fini+0x89
zone_drain() at zone_drain+0x1e5
zone_foreach() at zone_foreach+0x4d
uma_reclaim() at uma_reclaim+0x21
vm_pageout() at vm_pageout+0x5fc
fork_exit() at fork_exit+0x8f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xa54f5d00, rbp = 0 ---
db> ps
  pid   proc uid  ppid  pgrp  flag   stat  wmesgwchan  cmd
52843 ff0005248ba00 52838 52838 0004000 [RUNQ] perl
52842 ff00387e9ba0   91 52840 52840 0004000 [CPU 1] python
52840 ff0003162ba0   91 52837 52840 0004000 [SLPQ wait  
0xff0003162ba0][SLP] sh
52838 ff000fa50 52835 52838 0004000 [SLPQ wait  
0xff000fa5][SLP] sh
52837 ff003bea40000   636   636 000 [SLPQ piperd  
0xff001e0a7b40][SLP] cron
52835 ff00227fb0000   636   636 000 [SLPQ piperd  
0xff0011be2000][SLP] cron
52824 ff00394758b8 1000   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52822 ff002fdd7000 1000   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52820 ff0037475ba0 1000   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52818 ff001a5bb8b8 1000   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52815 ff000f20fba0 1000   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52813 ff0034d442e8 1000   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52811 ff0023f9eba0 1000   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52808 ff001a140ba00  1291 52808 0004100 [SLPQ sbwait  
0xff0016cb39b8][SLP] ftpd
52806 ff0039475ba00  1291 52806 0004100 [SLPQ sbwait  
0xff00081abe08][SLP] ftpd
52805 ff0035c5e2e80  1291 52805 0004100 [SLPQ sbwait  
0xff000395c118][SLP] ftpd
52764 ff000328b2e8 1051   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52730 ff0036e4c8b80  1248  1248 000 [SLPQ accept  
0xff002c2108fe][SLP] perl5.8.6
52589 ff002b39c5d0 1027 52588 52589 0004002 [SLPQ ttyin  
0xff0024801410][SLP] bash
52588 ff001a1402e8 1027 52585 52585 100 [SLPQ select  
0x8082b2d0][SLP] sshd
52585 ff002b227ba00   609 52585 100 [SLPQ sbwait  
0xff002075abe0][SLP] sshd
52548 ff002c6be5d00  1248  1248 000 [SLPQ accept  
0xff002c2108fe][SLP] perl5.8.6
52387 ff00374758b8 1024   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
52299 ff00219515d00  1248  1248 000 [SLPQ accept  
0xff002c2108fe][SLP] perl5.8.6
52275 ff001471d8b80  1248  1248 000 [SLPQ accept  
0xff002c2108fe][SLP] perl5.8.6
46768 ff00117825d0 1001 46765 46768 0004002 [SLPQ ttyin  
0xffa8b810][SLP] bash
46765 ff001b674000 1001 46749 46749 100 [SLPQ select  
0x8082b2d0][SLP] sshd
46749 ff0037ab32e80   609 46749 100 [SLPQ sbwait  
0xff003694f790][SLP] sshd
46699 ff0003dc32e8 6681 46695 46699 0004002 [SLPQ select  
0x8082b2d0][SLP] pine
46695 ff001a44a2e8 6681 46694 46695 0004002 [SLPQ wait  
0xff001a44a2e8][SLP] bash
46694 ff0005248000 6681 46689 46689 100 [SLPQ select  
0x8082b2d0][SLP] sshd
46689 ff002eec15d00   609 46689 100 [SLPQ sbwait  
0xff001ecca118][SLP] sshd
45600 ff001529d2e8 1001   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
45043 ff0034d44000   80   697   697 100 [SLPQ accept  
0xff002cfef05e][SLP] httpd
43651 ff0019e05000 6682   744   744 0004000 [SLPQ select  
0x8082b2d0][SLP] imapd
42697 ffdb1000   80   697   697 100 [SLPQ accept  
0xff002cfef05e][SLP] httpd
42696 ff00086d78b8   80   697   697 100 [SLPQ accept  
0xff002cfef05e][SLP] htt

NFS-related hang in 5.4?

2005-06-19 Thread Eirik Øverby


Hi,

when doing large file transfers (backing up jails using tar+gzip to a  
neighboring server), NFS has a tendency to lock up on me. This  
usually happens after quite a while - like a few hours or so. Also,  
before the hang, performance is generally bad.


KDB trace:

db> trace
Tracing pid 56 tid 100064 td 0xc1a18600
kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30
siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1+0xe7
siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78
intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at  
intr_execute_handlers+0x88

lapic_handle_intr(34) at lapic_handle_intr+0x3a
Xapic_isr1() at Xapic_isr1+0x33
--- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp = 0xd5480818 ---
_mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0
udp_input(c2d4,14,c1a99000,1,0) at udp_input+0x257
ip_input(c2d4,0,0,0,0) at ip_input+0x590
transmit_event(c1c64100,2094,0,c1d58a80,7f4220) at transmit_event 
+0x107
ready_event_wfq(c1c64100,2094,0,c1d58a80,c06d860a) at  
ready_event_wfq+0x511

dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519
ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1
pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at  
pfil_run_hooks+0x138

ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593
udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597
udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30
sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1
nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9
nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342
nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at  
nfs_writerpc+0x2a0

nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508
nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db
fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 ---

I cannot seem to kill process 56 (nfsiod), so I have to reset the box.

Anyone got a clue? What can I do to ease debugging here? Next time it  
happens I can probably make a dump, at least I will have a debug  
kernel running then.


/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS-related hang in 5.4?

2005-06-19 Thread Eirik Øverby



On 19. jun. 2005, at 20.06, Robert Watson wrote:



On Sun, 19 Jun 2005, Eirik Øverby wrote:


when doing large file transfers (backing up jails using tar+gzip  
to a neighboring server), NFS has a tendency to lock up on me.  
This usually happens after quite a while - like a few hours or so.  
Also, before the hang, performance is generally bad.




Hmm.  Looks like a bug in dummynet.  ipfw should not be directly re- 
injecting UDP traffic back into the input path from an outbound  
path, or it risks re-entering, generating lock order problems, etc.  
It should be getting dropped into the netisr queue to be processed  
from the netisr context.


This problem would exist across all 5.4 installations, both i386 and  
amd64? Would it depend on heavy load, or could it theoretically  
happen at any time when there's traffic? All three of my fbsd5  
servers (dual opteron, dual p3-1ghz, dual p3-700mhz) are experiencing  
random hangs with ~a few weeks between, impression is that if running  
single-cpu mode they are all stable. All using dummynet in a  
comparable manner. Ideas?


Is it possible to configure dummynet out of your configuration, and  
see if the problem goes away?


I'm running a test right now, will let you know in the morning.



Robert N M Watson




KDB trace:

db> trace
Tracing pid 56 tid 100064 td 0xc1a18600
kdb_enter(c096bad3,4,480758,c08dcbf9,f5) at kdb_enter+0x30
siointr1(c1a8e000,c1a18600,c1a148d4,c1a12700,c1a12700) at siointr1 
+0xe7

siointr(c1a8e000,0,0,4,c1a18600) at siointr+0x78
intr_execute_handlers(c19bd090,d54807bc,d5480818,c08d05a3,34) at  
intr_execute_handlers+0x88

lapic_handle_intr(34) at lapic_handle_intr+0x3a
Xapic_isr1() at Xapic_isr1+0x33
--- interrupt, eip = 0xc06b8490, esp = 0xd5480800, ebp =  
0xd5480818 ---

_mtx_lock_sleep(c0a1cd2c,c1a18600,0,0,0) at _mtx_lock_sleep+0xb0
udp_input(c2d4,14,c1a99000,1,0) at udp_input+0x257
ip_input(c2d4,0,0,0,0) at ip_input+0x590
transmit_event(c1c64100,2094,0,c1d58a80,7f4220) at  
transmit_event+0x107
ready_event_wfq(c1c64100,2094,0,c1d58a80,c06d860a) at  
ready_event_wfq+0x511

dummynet_io(c2bd2e00,64,1,d54809c8,c2bd2e00) at dummynet_io+0x519
ipfw_check_out(0,d5480a24,c1a99000,2,c1d1821c) at ipfw_check_out+0xf1
pfil_run_hooks(c0a1c160,d5480a9c,c1a99000,2,c1d1821c) at  
pfil_run_hooks+0x138

ip_output(c2bd2e00,0,0,0,0) at ip_output+0x593
udp_output(c1d1821c,c2bd2e00,0,0,c1a18600) at udp_output+0x597
udp_send(c2242654,0,c1e12100,0,0) at udp_send+0x30
sosend(c2242654,0,0,c1e12100,0) at sosend+0x6f1
nfs_send(c2242654,c1d57860,c1e12100,c2313900,1c) at nfs_send+0xc9
nfs_request(c22cf108,c1e12a00,7,0,c20bb300) at nfs_request+0x342
nfs_writerpc(c22cf108,d5480ca4,c20bb300,d5480c94,d5480c98) at  
nfs_writerpc+0x2a0

nfs_doio(cbf75e08,c20bb300,0,c094f9b4,0) at nfs_doio+0x508
nfssvc_iod(c0a21828,d5480d38,0,0,0) at nfssvc_iod+0x1db
fork_exit(c07c5150,c0a21828,d5480d38) at fork_exit+0x80
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xd5480d6c, ebp = 0 ---

I cannot seem to kill process 56 (nfsiod), so I have to reset the  
box.


Anyone got a clue? What can I do to ease debugging here? Next time  
it happens I can probably make a dump, at least I will have a  
debug kernel running then.


/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS-related hang in 5.4?

2005-06-20 Thread Eirik Øverby


On 20. jun. 2005, at 10.38, Robert Watson wrote:




On Mon, 20 Jun 2005, Eirik Øverby wrote:



Hmm.  Looks like a bug in dummynet.  ipfw should not be directly  
re- injecting UDP traffic back into the input path from an  
outbound path, or it risks re-entering, generating lock order  
problems, etc. It should be getting dropped into the netisr queue  
to be processed from the netisr context.





This problem would exist across all 5.4 installations, both i386  
and amd64? Would it depend on heavy load, or could it  
theoretically happen at any time when there's traffic? All three  
of my fbsd5 servers (dual opteron, dual p3-1ghz, dual p3-700mhz)  
are experiencing random hangs with ~a few weeks between,  
impression is that if running single-cpu mode they are all stable.  
All using dummynet in a comparable manner. Ideas?





Yes.  Basically, the network stack avoids recursion in processing  
for "complicated" packets by deferring processing an offending  
packet to a thread called the 'netisr'.  Whenever the stack reaches  
a possible recursion point on a packet, it's supposed to queue the  
packet for processing 'later' in a per-protocol queue, unwind, and  
then when the netisr runs, pick up and continue processing.  In the  
stack trace you provide, dummynet appears to immediately  
immediately invoke the in-bound network path from the out-bound  
network path, walking back into the network stack from the outbound  
path.  This is generally forbidden, for a variety of reasons:


- We do allow the in-bound path to call the out-bound path, so that
  protocols like TCP, and services like NFS can turn around packets
  without a context switch.  If further recursion is permitted, the  
stack

  may overflow.

- Both paths may hold network stack locks over calls in either  
direction
  -- specifically, we allow protocol locks to be held over calls  
into the

  socket layer, as the protocol layer drives operation; if a recursive
  call is made, deadlocks can occur due to violating the lock  
order.  This

  is what is happening in your case.

Pretty much all network code is entirely architecture-independent,  
so bugs typically span architectures, although race conditions can  
sometimes be hard to reproduce if they require precise timing and  
multiple processors.




So I'm lucky to have seen this one... Great ;)


Is it possible to configure dummynet out of your configuration,  
and see if the problem goes away?





I'm running a test right now, will let you know in the morning.




Thanks.



I know enough not to call this a "confirmation", but disabling  
dummynet did indeed allow me to finish the backup. I never made it  
past 15GBs before, now the full 19GB tar.gz file is done, and the  
boxes are both still running. The funny thing is - I only disabled  
dummynet on one of the boxes now - the source of the backup, the box  
that pushes data. The other box has pretty much 100% the same setup,  
and is also i386. But as traffic shaping can only happen on outgoing  
packets, I suppose that makes sense.


I can try re-running the test again if you wish, in order to gain  
more statistics. It's just too bad it takes a while ;)



/Eirik




Robert N M Watson




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: NFS-related hang in 5.4?

2005-06-20 Thread Eirik Øverby



On 20. jun. 2005, at 17.18, Marc Olzheim wrote:


On Mon, Jun 20, 2005 at 10:53:19AM +0200, Eirik verby wrote:


I know enough not to call this a "confirmation", but disabling
dummynet did indeed allow me to finish the backup. I never made it
past 15GBs before, now the full 19GB tar.gz file is done, and the
boxes are both still running. The funny thing is - I only disabled
dummynet on one of the boxes now - the source of the backup, the box
that pushes data. The other box has pretty much 100% the same setup,
and is also i386. But as traffic shaping can only happen on outgoing
packets, I suppose that makes sense.



Hmm, does that solve kern/79208 for you as well by any chance ?


Seems not.

Now, how do I get my box back to life? ;)

/Eirik



Marc



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Network/fxp related panic in 5.4?

2005-06-24 Thread Eirik Øverby


Hi all,

I recently re-enabled SMP on one of my 5.4 servers (dual intel p3),  
and after a relatively short while (couple of days) it starts acting  
up. Today it was frozen and had jumped into kernel debugger on serial  
console. Problem is that my serial console was controlled by a  
terminal at work, and when I got home it seemed that the work  
terminal had disconnected. All I could do was a 'trace' - I don't  
have the panic screen (if any) nor do I have any other output because  
the watchdog triggered the powerswitch cycle just after I got the trace:


Tracing pid 29 tid 10 td 0xc22a
fxp_intr_body(c2404000,c2404000,40,,8) at fxp_intr_body+0xd0
fxp_intr(c2404000,0,0,0,0) at fxp_intr+0x14e
ithread_loop(c22f6500,e3384d38,0,0,0) at ithread_loop+0x1b8
fork_exit(c06a9150,c22f6500,e3384d38) at fork_exit+0x80
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe3384d6c, ebp = 0 ---
db>

What makes me wonder is ... When I connected the serial console, the  
db> prompt was already there. Does that mean that the work terminal  
disconnect somehow sent a telnet break, and triggered the kernel  
debugger? I.e. - this was no panic, but a stupid serial console hiccup?


Is there any way to prevent this in the future - like changing the  
control character that would trigger the kernel debugger? (I have  
BREAK_TO_DEBUGGER in my kernel config..)


Thanks,
/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Jails that won't die...

2005-06-28 Thread Eirik Øverby


Hi,

I have, since upgrading to 5.x and updating my management tools, seen  
a number of problems relating to stopping jails.


I'm maintaining several hosts with a number of full-featured jails  
(i.e. full virtual FreeBSD installations in each jail), and in  
general this works fine. However, whenever I stop a jail using 'jexec  
 kill -SIGNAL -1' or 'jexec  /bin/sh /etc/rc.shutdown' (in  
various combinations), jails have a tendency to stick around for  
minutes or hours - according to 'jls'. Often I see an entry in  
'netstat -a' indicating that there is one or more sockets in FIN_WAIT  
state, preventing the jail from coming down. Taking the virtual  
network interface (alias) down does not help. All I can do at this  
point is wait.


I normally use 'jls' to determine whether or not a jail can be  
restarted (i.e. it's not running), but this is pretty useless in such  
cases. And right now I have a case where 'netstat -a' shows me  
nothing pertaining to the jail, though it has no processes running. I  
have therefore force-started the jail again, which seems to work  
nicely, but now 'jls' gives me two entries for this jail, with  
different JIDs.


What am I doing wrong here?

/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Jails that won't die...

2005-06-29 Thread Eirik Øverby



On 28. jun. 2005, at 16.58, Brian Fundakowski Feldman wrote:


On Tue, Jun 28, 2005 at 10:37:29AM +0200, Eirik Øverby wrote:


Hi,

I have, since upgrading to 5.x and updating my management tools, seen
a number of problems relating to stopping jails.

I'm maintaining several hosts with a number of full-featured jails
(i.e. full virtual FreeBSD installations in each jail), and in
general this works fine. However, whenever I stop a jail using 'jexec
 kill -SIGNAL -1' or 'jexec  /bin/sh /etc/rc.shutdown' (in
various combinations), jails have a tendency to stick around for
minutes or hours - according to 'jls'. Often I see an entry in
'netstat -a' indicating that there is one or more sockets in FIN_WAIT
state, preventing the jail from coming down. Taking the virtual
network interface (alias) down does not help. All I can do at this
point is wait.

I normally use 'jls' to determine whether or not a jail can be
restarted (i.e. it's not running), but this is pretty useless in such
cases. And right now I have a case where 'netstat -a' shows me
nothing pertaining to the jail, though it has no processes running. I
have therefore force-started the jail again, which seems to work
nicely, but now 'jls' gives me two entries for this jail, with
different JIDs.

What am I doing wrong here?



You could just use ps to check for jailed processes and check their
respective jails using the procfs status entry (at least according
to the ps manpage...)


My jailctl script can do both - list by jls and list by processes in  
the jail. There are NO processes running in the jail.


/Eirik




--
Brian Fundakowski Feldman
\'[ FreeBSD ]''''''''''\
  <> [EMAIL PROTECTED]   \  The Power  
to Serve! \
 Opinions expressed are my own.
\,,\

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Jails that won't die...

2005-06-30 Thread Eirik Øverby



On 29. jun. 2005, at 20.58, Brian Fundakowski Feldman wrote:


On Wed, Jun 29, 2005 at 03:28:09PM +0200, Eirik Øverby wrote:



On 28. jun. 2005, at 16.58, Brian Fundakowski Feldman wrote:



On Tue, Jun 28, 2005 at 10:37:29AM +0200, Eirik Øverby wrote:



Hi,

I have, since upgrading to 5.x and updating my management tools,  
seen

a number of problems relating to stopping jails.

I'm maintaining several hosts with a number of full-featured jails
(i.e. full virtual FreeBSD installations in each jail), and in
general this works fine. However, whenever I stop a jail using  
'jexec

 kill -SIGNAL -1' or 'jexec  /bin/sh /etc/rc.shutdown' (in
various combinations), jails have a tendency to stick around for
minutes or hours - according to 'jls'. Often I see an entry in
'netstat -a' indicating that there is one or more sockets in  
FIN_WAIT

state, preventing the jail from coming down. Taking the virtual
network interface (alias) down does not help. All I can do at this
point is wait.

I normally use 'jls' to determine whether or not a jail can be
restarted (i.e. it's not running), but this is pretty useless in  
such

cases. And right now I have a case where 'netstat -a' shows me
nothing pertaining to the jail, though it has no processes  
running. I

have therefore force-started the jail again, which seems to work
nicely, but now 'jls' gives me two entries for this jail, with
different JIDs.

What am I doing wrong here?




You could just use ps to check for jailed processes and check their
respective jails using the procfs status entry (at least according
to the ps manpage...)



My jailctl script can do both - list by jls and list by processes in
the jail. There are NO processes running in the jail.



So it's obviously not running, and you can mark its state as such.


...which is what I do on FreeBSD 4.x, but on 5.x the 'jls' command  
still claims the jail is running. I think this is unbelieveably  
dirty. Also, using /proc to determine if a jail is still running is a  
bad idea, as mounting /proc is depreceated.


/Eirik



--
Brian Fundakowski Feldman
\'[ FreeBSD ]''''''''''\
  <> [EMAIL PROTECTED]   \  The Power  
to Serve! \
 Opinions expressed are my own.
\,,\

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Jails that won't die...

2005-06-30 Thread Eirik Øverby



On 30. jun. 2005, at 22.56, Brian Fundakowski Feldman wrote:


On Thu, Jun 30, 2005 at 03:53:56PM +0200, Eirik Øverby wrote:



On 29. jun. 2005, at 20.58, Brian Fundakowski Feldman wrote:



On Wed, Jun 29, 2005 at 03:28:09PM +0200, Eirik Øverby wrote:




On 28. jun. 2005, at 16.58, Brian Fundakowski Feldman wrote:




On Tue, Jun 28, 2005 at 10:37:29AM +0200, Eirik Øverby wrote:




Hi,

I have, since upgrading to 5.x and updating my management tools,
seen
a number of problems relating to stopping jails.

I'm maintaining several hosts with a number of full-featured  
jails

(i.e. full virtual FreeBSD installations in each jail), and in
general this works fine. However, whenever I stop a jail using
'jexec
 kill -SIGNAL -1' or 'jexec  /bin/sh /etc/ 
rc.shutdown' (in

various combinations), jails have a tendency to stick around for
minutes or hours - according to 'jls'. Often I see an entry in
'netstat -a' indicating that there is one or more sockets in
FIN_WAIT
state, preventing the jail from coming down. Taking the virtual
network interface (alias) down does not help. All I can do at  
this

point is wait.

I normally use 'jls' to determine whether or not a jail can be
restarted (i.e. it's not running), but this is pretty useless in
such
cases. And right now I have a case where 'netstat -a' shows me
nothing pertaining to the jail, though it has no processes
running. I
have therefore force-started the jail again, which seems to work
nicely, but now 'jls' gives me two entries for this jail, with
different JIDs.

What am I doing wrong here?





You could just use ps to check for jailed processes and check  
their

respective jails using the procfs status entry (at least according
to the ps manpage...)




My jailctl script can do both - list by jls and list by  
processes in

the jail. There are NO processes running in the jail.




So it's obviously not running, and you can mark its state as such.



...which is what I do on FreeBSD 4.x, but on 5.x the 'jls' command
still claims the jail is running. I think this is unbelieveably
dirty. Also, using /proc to determine if a jail is still running is a
bad idea, as mounting /proc is depreceated.



The deprecation is due to security concerns, not bit-rot.  You can
just mount it with root-readable-only permissions.  The jls for
current isn't incorrect, you're just expecting a different criteria to
mean "alive" than it is using.  It would take increased kernel
complexity to do what you want if you're not going to do it in
userland.


I am aware of that. However, I have seen instabilities with /proc as  
well, but that's another story.



Anyway, why aren't you just using a /var/run file in the "real" system
to tell whether the jail is running or not?  It's the corollary to
pid files versus doing "killall"...  Just seems like something really
trivial to implement as you like it in the userland.


Sure, this is what I fall back on when running my jailctl script (/ 
usr/ports/sysutils/jailctl) on 4.x. However, I NEED 'jls' to be  
correct, because I use it to inject other processes (like executing  
shutdown scripts inside the jails when taking them down, etc.). I  
suppose I could sort the output of jls on jail id and always use  
whichever instance of a jail has the highest ID, but I don't know how  
these IDs work - if they are recycled, if they "wrap around" at some  
point, etc.


In any case it would be nice to know which criteria exactly jls uses  
- and perhaps a way to remove whichever criteria that keeps it  
thinking the jail is still running.


Thing is - sometimes jails stop just fine. Other times they don't. It  
all depends. Perhaps I should get lsof or something, see if there are  
any open files (though I think I tried once without finding any)...


/Eirik



--
Brian Fundakowski Feldman
\'[ FreeBSD ]''''''''''\
  <> [EMAIL PROTECTED]   \  The Power  
to Serve! \
 Opinions expressed are my own.
\,,\




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: FreeBSD -STABLE servers repeatedly crashing.

2005-07-06 Thread Eirik Øverby



On Jul 6, 2005, at 6:29 PM, Blaz Zupan wrote:


On Wed, 6 Jul 2005, Kris Kennaway wrote:


That should be OK as long as you're not cross-compiling for different
architectures.



No, we only have i386 boxes.


Hi,

thanks for doing this work. I was working on preparing a similiar set  
of information, but have been too overworked lately.


We have ordered and had delivered a substansial number of DL380  
(intel) and DL385 (amd64) machines, that will all be running FreeBSD.  
However, the recent reports about trouble on these systems has made  
me wary. Perhaps this will give FreeBSD the solution it needs (I've  
seen similiar issues on other SMP systems), and me the sleep I need  
before launch in September ;)


Thanks again. Now just hoping it's helpful to someone ;)

/Eirik



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: FreeBSD 6.0-BETA1 Available

2005-07-15 Thread Eirik Øverby



On Jul 15, 2005, at 5:10 PM, Emanuel Strobl wrote:


Am Freitag, 15. Juli 2005 16:58 CEST schrieb Marc G. Fournier:

And, for "the stupid question of the day" ... how long before 5.x  
is no

longer supported?  I'm just about to deploy a new server, and was
*going* to go with 5.x, but would I be better just skipping 5.x
altogether?  Or are there such drastic changes in 6.x that doing  
so at

this time wouldn't be prudent?



To post my opinion to the last part of the question: I'm also  
deploying new

servers and I'll take RELENG_6 since there are so many improovements
(nullfs in jails etc.) and 6-current has been pretty stable for me  
on my


Hoi,
what's changed wrt jails? And nullfs? I haven't been following the  
"news" as closely as I perhaps should, but I feel that the jail  
functionality doesn't get half as much attention in release notes as  
it should... Porting my jail-related tools to 5.x from 4.x was  
painful, but enjoyable when I was done. How does 6.x look?


/Eirik



UP workstation with all kinds of new stuff enabled (ULE  
PREEMPTION), so I

guess I won't see more troubles than with 5.4, I think less :)

-Harry




On Fri, 15 Jul 2005, Scott Long wrote:


Announcement


The FreeBSD Release Engineering Team is pleased to announce the
availability of FreeBSD 6.0-BETA1, which marks the beginning of the
FreeBSD 6.0 Release Cycle.

FreeBSD 6.0 will be a much less dramatic step from the FreeBSD 5
branch than the FreeBSD 5 branch was from FreeBSD 4.  Much of the  
work

that has gone into 6.0 development has focused on polishing and
improving the work from 5.x  These changes include streamlining  
direct

device access in the kernel, providing a multi-threaded SMP-safe
UFS/VFS filesystem layer, implementing WPA and Host-AP 802.11
features, as well as countless bugfixes and device driver
improvements.  Major updates and improvements have been made to ACPI
power and thermal management, ATA, and many aspects of the network
infrastructure.  32bit application support for AMD64 is also greatly
improved, as is compatiblity with certain Athlon64 motherboards.   
This
release is also the first to feature experimental PowerPC support  
for

the Macintosh G3 and G4 platforms.

This BETA1 release is in the same basic format as the Monthly
Snapshots. For most of the architectures only the ISO images are
available though the FTP install tree is available for a couple  
of the

architectures.

We encourage people to help with testing so any final bugs can be
identified and worked out.  Availability of ISO images is given  
below.

If you have an older system you want to update using the normal
CVS/cvsup source based upgrade the branch tag to use is RELENG_6
(though that will change for the Release Candidates later).  Problem
reports can be submitted using the send-pr(1) command.

The list of open issues and things still being worked on are on the
todo list:

http://www.freebsd.org/releases/6.0R/todo.html

Since this is the first release of a new branch we only have a rough
idea for some of the dates.  The current rough schedule is available
but most dates are still listed as "TBD - To Be Determined":

http://www.freebsd.org/releases/6.0R/schedule.html

Known Issues


For the PowerPC architecture /etc/fstab isn't written out  
properly, so

the first boot throws you into the mountroot> prompt.  You will need
to manually enter where the root partition is and fix /etc/fstab.
Also the GEM driver is listed as 'unknown' in the network config
dialog.

For all architectures a kernel rebuild might be needed to get some
FreeBSD 5 applications to run.  Add "options COMPAT_FREEBSD5" to the
kernel configuration file if you have problems with FreeBSD 5
executables.


Availability


The BETA1 ISOs and FTP support are available on most of the FreeBSD
Mirror sites.  A list of the mirror sites is available here:


http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/mirrors- 
ftp.

html

The MD5s are:

MD5 (6.0-BETA1-alpha-bootonly.iso) =  
eabda0a086e5492fe43626ce5be1d7e1

MD5 (6.0-BETA1-alpha-disc1.iso) = d7fe900bb3d5f259cc3cc565c4f303e4

MD5 (6.0-BETA1-amd64-bootonly.iso) =  
9b04cb2f68300071c717f4aa4220bdac

MD5 (6.0-BETA1-amd64-disc1.iso) = cb0f21feaf8b7dd9621f82a8157f6ed8
MD5 (6.0-BETA1-amd64-disc2.iso) = 84d40bc291a9ed5cd69dfa717445eeb5

MD5 (6.0-BETA1-i386-bootonly.iso) = 38e0b202ee7d279bae002b883f7074ec
MD5 (6.0-BETA1-i386-disc1.iso) = b2baa8c18d4637ef02822a0da6717408
MD5 (6.0-BETA1-i386-disc2.iso) = 2b151a3cea8843d322c75ff76779ffcf

MD5 (6.0-BETA1-ia64-bootonly.iso) = 97800ec7d4b29927a8e66a2b53e987fb
MD5 (6.0-BETA1-ia64-disc1.iso) = 7d29cd9317997136507078971762a0d8
MD5 (6.0-BETA1-ia64-livefs.iso) = 6ff974e60a3964cf16fcec05925c14e9

MD5 (6.0-BETA1-pc98-disc1.iso) = 40a3134cce89bd5f7033d8b9181edf91

MD5 (6.0-BETA1-powerpc-bootonly.iso) =
2f64974e9bd5adcf813f5d35ff742443 MD5 (6.0-BETA1-powerpc-disc1.iso) =
b2562c38414ff4866f5ed8b3a38683c8

MD5 (6.0-BETA1-sparc64-booto

Serious issue with serial console in 5.4

2005-07-18 Thread Eirik Øverby


Hi,

I reported this before, but I am very surprised that it is still the  
case:


(This is from the last time it happened; this time the box rebooted  
and cleared the serial console before I had time to cut/paste it.

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 00
fault virtual address   = 0x1c
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc0620b5f
stack pointer   = 0x10:0xdadbd988
frame pointer   = 0x10:0xdadbd994
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 51999 (getty)
trap number = 12
panic: page fault
cpuid = 1
boot() called on cpu#0
Uptime: 66d11h24m50s


The above panic will show up occasionally when logging out from a  
serial console (i.e. ctrl-D, logout, exit, whatever). This is  
EXTREMELY BAD, as it will crash an otherwise perfectly healthy box at  
random - and renders the serial console useless.


Robert Watson confirmed this to be an issue on the 10th of April.

Anyone??

/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Q: RT32 (Request Tracker) + jail

2005-07-20 Thread Eirik Øverby



On Jul 20, 2005, at 2:22 AM, J. Nyhuis wrote:


Greetings,

I would like to have RT running in a jailed environment.  The  
challenge, it seems, will be to get sendmail running in the same  
jailed environment as RT and the other components.
For those not so familiar with the components of RT, the jail  
would include apache1.3+modperl, MySQL, sendmail, and RT.  That's a  
lot of stuff to get working in there!  (but fortunately FreeBSD  
jails seem straightforward and easy) ^_^

I expect sendmail to be the real problem of the above bunch.

Has anyone actually tried to do this with a big multi-part app  
like RT (I have not spotted anyone's documented attempts on Google)  
and would be willing to share to the list?


If I were you I would grab /usr/ports/sysutils/jailctl (ok, insert  
blatant self-praise here ;), create yourself one or more jails, and  
log into them as if they were normal fbsd installs. Everything you  
mention should work perfectly fine; I'm running anything between 5  
and 50 jails of similiar types (with web, mail, database, cvs,  
subversion, you name it running in them, in various combinations) on  
both private and work-owned hosts, some of them performing extremely  
critical tasks (think CC payment handling for millions of users).


Wouldn't worry about sendmail ;)


Does anyone else wonder if I've lost it? (Don't answer that)...


Not at all.

/Eirik


^_^

Thanks,

John H. Nyhuis
Sr. Computer Specialist
Dept. of Pediatrics
HS RR349B, Box 356320
University of Washington
Desk: (206)-685-3884
[EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: TinyBSD Call For Testers

2005-07-20 Thread Eirik Øverby



On Jul 18, 2005, at 8:17 PM, Jean Milanez Melo wrote:


Hello gentlemen,

In the last saturday a new port has been added under sysutils/  
category, ports/sysutils/tinybsd. TinyBSD is a tool which was meant  
to allow an easy way to build embedded systems based on FreeBSD. It  
is based on userland copying, library dependencies check/copy and  
kernel build.


We did our best to make the embedded system creation an easy and  
specially fast proccess. The main (default) system generates an  
embedded system image which is about 20MB in size, which is a very  
generic approach, with a number of wired NIC support, and also the  
most popular wireless support (including atheros), divert, bridge,  
dummynet, firewall, etc; and CPU_ELAN (for soekris devices). If the  
"generic" system gets tighten up the final result can be as low as  
an 8MB embedded system.


We are giving you this intro to ask you please to test TinyBSD out,  
the most that you can, and send every possible feedback regarding  
it. The main tinybsd goal is to make embedded systems creation a  
process which must be


1 - fast
2 - easy
3 - 100% functional

If you can test it, we would appreciate your thoughts. If you think  
any of those 3 goals can't be reached for you, or could be  
improved, also let me know.


Thanks for testing


Without having actually tried yet (time hasn't been very permitting  
lately), is it conceivable to use this tool to create slim-but- 
functional jails? Sans the kernel part, that is?


/Eirik



--
Atenciosamente
Jean Milanez Melo
FreeBSD Brasil LTDA.
Fone: (31) 3281-9633
http://www.freebsdbrasil.com.br

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Panic when logging out from serial console

2005-07-20 Thread Eirik Øverby



On Apr 10, 2005, at 1:42 PM, Robert Watson wrote:



On Sun, 10 Apr 2005 [EMAIL PROTECTED] wrote:


warning: This report might be somewhat vague. For quite a while  
now I`ve been plagued with the problem that logging out from a  
serial console causes the box to panic. For a while I`ve been sure  
this was isolated to one of my boxen, because it`s been acting up  
in other ways as well, but today it happened on two other boxes  
too! And these boxes have been rock stable for the last two years.


I`m running a fairly recent variation of RELENG-5 on all the  
boxes; one of them is amd64, the two others - including the one  
I`ve pasted from - are plain old p3 machines. They are all dual- 
CPU though.




I've seen precisely this panic -- in fact, I saw it yesterday on a  
RELENG_5 box, and under identical circumstances -- it looks like it  
happens if a last process in a login session on a serial console  
closes the tty, and then getty re-opens it while there's console  
output coming from syslog.  I was able to get a core dump, but  
haven't made much headway on it yet.  It looks like the tty  
structure has been released -- the refcount on the tty is 0, and  
the mutex pointers in the kqueue state have been cleared (hence the  
null pointer dereference you see).  Now, the question is why --  
I've added some debugging output to the local box I saw it on, and  
will see if I can reproduce it.


Did you ever manage to reproduce - or fix - this? I had a rather  
nasty incident recently due to this, even on a very recently updated  
5.4. I sent a message to stable@ about it a few days ago, but have  
received no response.


I personally think that this must be very very (very) bad - serial  
consoles shouldn't do this! ;)


/Eirik




Robert N M Watson




I have no clue what I can do from here; has anyone seen this  
before? I can`t
always reproduce it, but the risk is fairly high - around 33% I`d  
say.


Anyone?

Thanks for your attention, details below.

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 00
fault virtual address   = 0x1c
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc0620b5f
stack pointer   = 0x10:0xdadbd988
frame pointer   = 0x10:0xdadbd994
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 51999 (getty)
trap number = 12
panic: page fault
cpuid = 1
boot() called on cpu#0
Uptime: 66d11h24m50s



/Eirik


This message was sent using IMP, the Internet Messaging Program.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"








___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Eirik Øverby



On Jul 21, 2005, at 7:00 AM, Kris Kennaway wrote:


On Mon, Jul 18, 2005 at 11:58:54AM +0200, Eirik ?verby wrote:


Hi,

I reported this before, but I am very surprised that it is still the
case:

(This is from the last time it happened; this time the box rebooted
and cleared the serial console before I had time to cut/paste it.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 00
fault virtual address   = 0x1c
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc0620b5f
stack pointer   = 0x10:0xdadbd988
frame pointer   = 0x10:0xdadbd994
code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 51999 (getty)
trap number = 12
panic: page fault
cpuid = 1
boot() called on cpu#0
Uptime: 66d11h24m50s



The above panic will show up occasionally when logging out from a
serial console (i.e. ctrl-D, logout, exit, whatever). This is
EXTREMELY BAD, as it will crash an otherwise perfectly healthy box at
random - and renders the serial console useless.

Robert Watson confirmed this to be an issue on the 10th of April.

Anyone??



You might have to wait until 6.0-R since fixing it seems to require
infrastructure changes that cannot easily be backported to 5.x.


With all due respect - if this is (and I'm assuming it is, because it  
happens on all the servers I'm serial-controlling) an omnipresent  
problem on 5.x, I daresay it should warrant some more attention.  
Having unsafe serial terminal support that can bring down your system  
like that defies much of the point of having serial terminal support  
in the first place.


However, since I seem to be the only one who has noticed this,  
perhaps I'm the last person on earth to routinely use serial terminal  
switches instead of KVM switches to do my admin work?


/Eirik



Kris



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Eirik Øverby



On Jul 21, 2005, at 12:16 PM, Robert Watson wrote:



On Thu, 21 Jul 2005, Eirik Øverby wrote:



The above panic will show up occasionally when logging out from a
serial console (i.e. ctrl-D, logout, exit, whatever). This is
EXTREMELY BAD, as it will crash an otherwise perfectly healthy  
box at

random - and renders the serial console useless.
Robert Watson confirmed this to be an issue on the 10th of April.

You might have to wait until 6.0-R since fixing it seems to  
require infrastructure changes that cannot easily be backported  
to 5.x.




With all due respect - if this is (and I'm assuming it is, because  
it happens on all the servers I'm serial-controlling) an  
omnipresent problem on 5.x, I daresay it should warrant some more  
attention. Having unsafe serial terminal support that can bring  
down your system like that defies much of the point of having  
serial terminal support in the first place.


However, since I seem to be the only one who has noticed this,  
perhaps I'm the last person on earth to routinely use serial  
terminal switches instead of KVM switches to do my admin work?




The concern about the 5.x backport is that it will break parts of  
the device driver ABI, and is a significant change that involves a  
lot of risk.


Regarding the general prevalence of the problem -- I've seen a  
small number of people reporting it's a big problem.  Since I know  
of a great many people running with serial consoles (other than a  
workstation, I never run FreeBSD boxes any other way), this leads  
me to believe it's something that shows up in fairly specific  
conditions -- perhaps relating to precise timing of a race  
condition.  This means that if we introduce a generally  
destabilizing change, it may impact more people than the problem as  
it exists (a nasty trade-off).


I've only seen the issue when logging out of a serial console  
session, and had previously hypothesized that it had to do with the  
simultaneous timing of a console message from syslog and the  
opening/closing of the console's tty due to logging out and getty  
restarting, resulting in a reference count improperly hitting zero.


I did indeed make some changes to my syslog configuration after  
getting the serials online. Your theory might not be entirely off.
Let me know if I should post my syslog.conf file or anything else  
here or elsewhere...


Thanks,
/Eirik


I thought Doug White had come up with a work-around patch that  
prevented the reference count from being allowed to hit 0 for the  
console by artificially elevating it, which would prevent the  
panic, so either (a) the work around wasn't committed, or (b) it  
didn't work.


I can attempt to take another look at this problem in a week or so,  
but have a number of things I need to finish up for FreeBSD 6.0  
before then that will be occupying my time.


Robert N M Watson


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Eirik Øverby



On Jul 21, 2005, at 1:04 PM, Robert Watson wrote:



On Thu, 21 Jul 2005, Eirik Øverby wrote:


I've only seen the issue when logging out of a serial console  
session, and had previously hypothesized that it had to do with  
the simultaneous timing of a console message from syslog and the  
opening/closing of the console's tty due to logging out and getty  
restarting, resulting in a reference count improperly hitting zero.




I did indeed make some changes to my syslog configuration after  
getting the serials online. Your theory might not be entirely off.  
Let me know if I should post my syslog.conf file or anything else  
here or elsewhere...




Since you appear to be able to reliably reproduce the problem  
(whereas I was able to reproduce it only after several hours of  
quite active serial console work), it would be quite interesting to  
answer the following question:


  If you cause syslogd not to send any output to /dev/console, does  
the

  problem go away?


I'm afraid to say it doesn't

/Eirik




Thanks,

Robert N M Watson


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Apache2 just listening to https?

2005-07-28 Thread Eirik Øverby


On Jul 28, 2005, at 8:58 AM, Roger Grosswiler wrote:


Hi,

I have apache2 running, with ssl. now, if i call my domain in a  
browser

not using https, i cannot connect.


Try adding port 80 to your Listen statement(s) in httpd.conf. Also  
make sure you have virtual hosts that capture requests on port 80.


/Eirik



ps aux shows this:

root59847  0.0  4.3  7528  4544  ??  Ss5:34PM   0:12.11
/usr/local/sbin/httpd -DSSL
www 59848  0.0  6.5  9368  6888  ??  I 5:34PM   0:03.80
/usr/local/sbin/httpd -DSSL
www 59849  0.0  5.9  8856  6292  ??  I 5:34PM   0:01.92
/usr/local/sbin/httpd -DSSL
www 59850  0.0  6.5  9364  6876  ??  I 5:34PM   0:04.55
/usr/local/sbin/httpd -DSSL
www 59852  0.0  6.0  8880  6332  ??  I 5:34PM   0:01.60
/usr/local/sbin/httpd -DSSL
www 59862  0.0  5.9  8852  6292  ??  I 5:37PM   0:03.14
/usr/local/sbin/httpd -DSSL
www 59931  0.0  5.1  8072  5436  ??  S 5:49PM   0:02.60
/usr/local/sbin/httpd -DSSL
www 59935  0.0  6.1  9312  6428  ??  I 5:50PM   0:01.89
/usr/local/sbin/httpd -DSSL
www 60152  0.0  5.3  8168  5652  ??  I 6:41PM   0:00.39
/usr/local/sbin/httpd -DSSL
www 60153  0.0  4.5  7728  4748  ??  I 6:41PM   0:00.55
/usr/local/sbin/httpd -DSSL
www 60154  0.0  5.2  8100  5504  ??  I 6:41PM   0:00.31
/usr/local/sbin/httpd -DSSL

 does this mean, that my apache just runs in ssl-mode???

tcp46  0  0  *.https*.* 
LISTEN
tcp46  0  0  *.http *.* 
LISTEN



...not really

do i have to create a virtual server if i use ssl?

Roger

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Apache2 just listening to https?

2005-07-28 Thread Eirik Øverby


On Jul 28, 2005, at 10:01 AM, Roger Grosswiler wrote:



Try adding port 80 to your Listen statement(s) in httpd.conf. Also  
make



sure you have virtual hosts that capture requests on port 80.



/Eirik


i did a file called virtual.conf in /usr/local/etc/apache2/Include  
with

this content:


ServerName freebsd.domain.net
ServerAlias freebsd.domain.net
DocumentRoot /usr/local/www/data



Make sure you are not enabling SSL globally, but for each vhost  
individually.
Try the telnet trick mentioned by others, but simply type "GET / HTTP/ 
1.0"  -- it should give you something about  
trying to talk HTTP to a HTTPS server. Would explain why lynx/links  
aren't working.


/Eirik



...which should be loaded on startup. Also, i activated

NameVirtualHost *:80

in httpd.conf - still no success...whats up here? firewall is open,
redirecting on router is well...but still no success...

:-( Roger



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

5.4-dropping to debugger

2005-08-31 Thread Eirik Øverby

Hi, every once in a while (about once a week lately), one of my  
servers has been known to stop responding. Upon connecting the serial  
console, I find myself at a debugger prompt. This is the output I've  
gotten this time.


I do think I have a debug kernel on that machine, what can I do to  
get more useful information out?


PS: I have seen various kinds of instability on most of my 5.4- 
installations, no matter the patchlevel. This box is just one of many.


Anyone?

/Eirik

db>
db> c
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x2007010
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc0581fe8
stack pointer   = 0x10:0xe3384c40
frame pointer   = 0x10:0xe3384c70
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 29 (irq18: fxp0)
[thread pid 29 tid 10 ]
Stopped at  fxp_add_rfabuf+0x68:movw%ax,0xe(%ebx)
db> trace
Tracing pid 29 tid 10 td 0xc22a
fxp_add_rfabuf(c2404000,c2404500,2,a6c54bb2,b51487f8) at  
fxp_add_rfabuf+0x68

fxp_intr_body(c2404000,c2404000,40,,8) at fxp_intr_body+0xf1
fxp_intr(c2404000,0,0,0,0) at fxp_intr+0x14e
ithread_loop(c22f6500,e3384d38,0,0,0) at ithread_loop+0x1b8
fork_exit(c06a9150,c22f6500,e3384d38) at fork_exit+0x80
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe3384d6c, ebp = 0 ---

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: 5.4-dropping to debugger

2005-08-31 Thread Eirik Øverby



On Aug 31, 2005, at 8:28 PM, Kris Kennaway wrote:


On Wed, Aug 31, 2005 at 12:51:00PM +0200, Eirik ?verby wrote:


Hi, every once in a while (about once a week lately), one of my
servers has been known to stop responding. Upon connecting the serial
console, I find myself at a debugger prompt. This is the output I've
gotten this time.

I do think I have a debug kernel on that machine, what can I do to
get more useful information out?



See the chapter on kernel debugging in the developers' handbook.


Sorry, poorly phrased question. Was in a bit of a hurry.
I have a debug kernel, however I have no dump device (and cannot  
create one; I'm geom-mirroring my disks, and for some reason I'm not  
able to specify a dump device when that is the case (has been  
discussed in the past).
I've been told that a debug kernel might still help, but the  
developers handbook does not say anything about what can be done  
without a dump. I know this has been up on one of the lists (current,  
stable or amd64) I'm on, so I guess I'll go ahead searching for it.


Sorry about the noise. Was just hoping someone recognized the symptoms.

/Eirik


Kris




PS: I have seen various kinds of instability on most of my 5.4-
installations, no matter the patchlevel. This box is just one of  
many.


Anyone?

/Eirik

db>
db> c
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x2007010
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc0581fe8
stack pointer   = 0x10:0xe3384c40
frame pointer   = 0x10:0xe3384c70
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 29 (irq18: fxp0)
[thread pid 29 tid 10 ]
Stopped at  fxp_add_rfabuf+0x68:movw%ax,0xe(%ebx)
db> trace
Tracing pid 29 tid 10 td 0xc22a
fxp_add_rfabuf(c2404000,c2404500,2,a6c54bb2,b51487f8) at
fxp_add_rfabuf+0x68
fxp_intr_body(c2404000,c2404000,40,,8) at fxp_intr_body+0xf1
fxp_intr(c2404000,0,0,0,0) at fxp_intr+0x14e
ithread_loop(c22f6500,e3384d38,0,0,0) at ithread_loop+0x1b8
fork_exit(c06a9150,c22f6500,e3384d38) at fork_exit+0x80
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe3384d6c, ebp = 0 ---

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Centralized building

2005-11-19 Thread Eirik Øverby


Hi all!

I've spent about a week trying to accomplish a rather simple task: To  
build kernel and world once for each architecture we have, and  
distribute this precompiled src and obj tree via NFS to all the  
systems that need updating. I have combined this with a locally  
maintained CVS tree, in order to assure coherent releases being  
installed on all our systems.


However, I am seeing some peculiar issues that I simply don't manage  
to get around.

Scenario:

I've got one server running 6.0-STABLE-i386. On this host I've  
created a jail for building. We have both i386 and amd64 platforms in- 
house, so I've created a script that build for both:

 make TARGET_ARCH=i386  MAKEOBJDIRPREFIX=/usr/obj.i386  buildworld
 make TARGET_ARCH=amd64 MAKEOBJDIRPREFIX=/usr/obj.amd64 buildworld
And the same for buildkernel.

Starting out trying to upgrade the amd64 hosts, I export the two obj  
directories via NFS, and mount them as /usr/obj on the amd64 hosts  
that need upgrading. This was, at least, my initial approach. I then  
found out that the /usr/src tree in the build jail is somehow tainted  
by the build (and by the options I specified), so I need to export  
that as well (which, I am afraid, means I have to maintain two  
different build jails). Therefore I also export /usr/src and mount it  
on the target hosts.


I then realized that I need to use the same objdir on the target  
hosts as in the build jail, so I try mounting to /usr/obj. on  
the target hosts. This allows me to get somewhat further.  
Installworld now progresses for a while, until it bombs out with the  
following error:

  ===> sys/boot/i386/boot2 (install)
  dd if=/dev/zero of=boot2.ldr bs=276 count=1
  dd: not found
  *** Error code 127

When looking for dd, I find it in the host PATH, and also in the obj  
dir:

  [EMAIL PROTECTED] /usr/obj.amd64# find . -name dd -type f
  ./amd64/usr/src/bin/dd/dd

At this point, I get rid of the MAKEOBJDIRPREFIX option and rebuild  
everything with just TARGET_ARCH, only exporting /usr/obj from the  
build jail. I notice that when using TARGET_ARCH with something else  
than the architecture the build is running on (i.e. amd64 on an i386  
host), the resulting build is NOT to be found in /usr/obj, but in / 
usr/obj/amd64. Thus I need to specify MAKEOBJDIRPREFIX=/usr/obj/amd64  
on the target host for installworld to get anything done at all.


I'm still getting the dd: not found error, and I do believe I've  
tried every combination and variation I can think of. Clocks are in  
sync between all the systems, so that is not the problem. Is the  
build system partially broken in 6.0? Have I missed something? Do I  
actually need an amd64 host to be able to build for amd64 systems, or  
are there other ways to accomplish what I'm trying to do? Should I  
prehaps try doing centralized binary upgrades instead?


Any help would be appreciated.
With best regards,
/Eirik

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Centralized building

2005-11-19 Thread Eirik Øverby



On Nov 19, 2005, at 13:28 , Joseph Koshy wrote:


Starting out trying to upgrade the amd64 hosts, I export the
two obj directories via NFS, and mount them as /usr/obj on the
amd64 hosts that need upgrading.


I done upgrades the other way, by having the build machine
mount the clients to-be-root partition and installing to it
using NFS.


Would I have to export every filesystem mount point on the hosts  
then? Or does an -alldirs option do the trick (in exports)?


In any case this would not be compatible with our security policy,  
unfortunately.


/Eirik


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Centralized building

2005-11-20 Thread Eirik Øverby



On Nov 19, 2005, at 19:43 , Joseph Koshy wrote:


AFAICT cross-compiling amd64 on a i386 machine isn't supported
yet. I ran into a similar problem when I upgraded an i386
machine to amd64. I thought I could just set CPUTYPE=athlon-64
and buildworld would do the right thing. Apparently not.


Bootstrapping a single machine is supported:

# make buildworld TARGET_ARCH=new-arch

plus a few other steps.  (See build(7)).

There have been a couple of postings on the mailing lists
on this topic in the recent past.  I've taken a stab at
describing how to cross-bootstrap too:

http://edoofus.blogspot.com/2005/10/cross-building-freebsd.html

The OP wanted to do a 'buildworld TARGET_ARCH=foo' on one
machine and then an 'installworld' on a different set of
machines.


Yes, and he still wonders if this is supposed to be doable or not.
I think the culprit is (partly) the fact that every architecture is  
built into its own subdirectory in /usr/obj, EXCEPT the architecture  
the build is running on. The same goes for the install part, and if  
the build and install architectures differ, it cannot ever work.  
Setting MAKEOBJDIRPREFIX on the target host makes the install start,  
but it fails after a couple of minutes with the "dd: not found" error.
(I do notice that there is a /usr/obj/usr directory created also when  
cross-building; I'm assuming this contains the build bootstrap tools).






--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Centralized building

2005-11-20 Thread Eirik Øverby



On Nov 20, 2005, at 09:50 , Eirik Øverby wrote:



On Nov 19, 2005, at 19:43 , Joseph Koshy wrote:


AFAICT cross-compiling amd64 on a i386 machine isn't supported
yet. I ran into a similar problem when I upgraded an i386
machine to amd64. I thought I could just set CPUTYPE=athlon-64
and buildworld would do the right thing. Apparently not.


Bootstrapping a single machine is supported:

# make buildworld TARGET_ARCH=new-arch

plus a few other steps.  (See build(7)).

There have been a couple of postings on the mailing lists
on this topic in the recent past.  I've taken a stab at
describing how to cross-bootstrap too:

http://edoofus.blogspot.com/2005/10/cross-building-freebsd.html

The OP wanted to do a 'buildworld TARGET_ARCH=foo' on one
machine and then an 'installworld' on a different set of
machines.


Yes, and he still wonders if this is supposed to be doable or not.
I think the culprit is (partly) the fact that every architecture is  
built into its own subdirectory in /usr/obj, EXCEPT the  
architecture the build is running on. The same goes for the install  
part, and if the build and install architectures differ, it cannot  
ever work. Setting MAKEOBJDIRPREFIX on the target host makes the  
install start, but it fails after a couple of minutes with the "dd:  
not found" error.
(I do notice that there is a /usr/obj/usr directory created also  
when cross-building; I'm assuming this contains the build bootstrap  
tools).


Follow-up. If I enter src/sys and do a "make install", the dd step  
works perfectly - however it stops later when trying to install  
cdboot. I am assuming this is due to missing options or wrong target  
for make, but - from all I can tell - shows a weakness in the build/ 
install system. Or maybe not...


Anyone??

/Eirik







--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-25 Thread Eirik Øverby


Hi all,

are there any obvious changes between 6.0-BETA3 and 6.0-RELEASE / 6.0- 
STABLE that I should be aware of, that could cause a quite noticeable  
decline in performance (and a change in performance patterns) for  
java/tomcat?


On a BETA-3 system I'm seeing, with the particular application we're  
running, about 28 transactions/second over a 10 minute interval. With  
-RELEASE and -STABLE I'm lucky to reach 24, and it'll usually wobble  
around 20.
Another oddity is that where the BETA-3 system starts out with good  
performance from the beginning when running load tests, the -RELEASE  
and -STABLE systems need a good 20 seconds to reach their "max",  
starting out very low (3-10 transactions/second for the first 10  
seconds or so).


This is on HP DL385 servers with dual 2.4ghz Opteron CPUs, running  
FreeBSD-amd64 from 15kRPM drives in cached RAID.


Hardware and software configuration (apart from the base system),  
network configuration and latencies, database access, etc. is 100%  
equal on all systems.


Any ideas?

Thanks,
/Eirik

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-28 Thread Eirik Øverby


On Nov 28, 2005, at 14:45 , Joseph Koshy wrote:


On 11/26/05, Eirik Øverby <[EMAIL PROTECTED]> wrote:
EØ> [Cross-posting after lack of response on -stable]

The first step would be do some performance debugging.


Yep.


 - What do top/vmstat/systat say about what the OS and
   apps are doing?  Is the CPU pegged at 100%?  What's
   the load seen by the disks?  Is the RAID in good health?


vmstat during system idle times are found below. I think they are  
rather interesting. To your other questions: The CPU usage is  
comparable on both systems. Not pegged at 100%, but load seems to  
stabilize around 0.5. Disk load is minimal on the application  
servers, somewhat more on the database servers, but they are not  
interesting here (they are not the bottle neck, and they perform  
equally). The RAIDs are in good health on both systems.


The vmstat output is interesting.
From the "fast" system (6.0-BETA3, ~idle):
[EMAIL PROTECTED] ~# vmstat -w 5
procs  memory  pagedisks faults  cpu
r b w avmfre  flt  re  pi  po  fr  sr da0 pa0   in   sy  cs  
us sy id
1 0 0 2439220  38048   14   0   0   0  14   0   0   0  170  141 437   
0  0 100
0 0 0 2439220  380282   0   0   0   3   0   2   0  192   94 475   
0  0 100
0 0 0 2439220  379161   0   0   0   6   0   1   0  291  925 926   
5  0 94
0 0 0 2439220  379160   0   0   0   0   0   0   0  185   91 458   
0  0 100
0 0 0 2439220  378201   0   0   0   6   0   3   0  289 1163 1124   
6  0 94
0 0 0 2439220  378200   0   0   0   0   0   0   0  183   91 454   
0  0 100


From the "slow" system (6.0-BETA3, ~idle):
[EMAIL PROTECTED] ~# vmstat -w 5
procs  memory  pagedisks faults  cpu
r b w avmfre  flt  re  pi  po  fr  sr da0 pa0   in   sy  cs  
us sy id
0 0 1 2468180  51660   15   0   0   0  18   4   0   0 1048 3200 5130   
0  0 100
0 0 0 2468180  516601   0   0   0   0   0   0   0 1004 3068 5063   
0  0 100
0 0 0 2468180  516600   0   0   0   0   0   0   0 1003 3094 5057   
0  0 100
0 0 0 2468180  516600   0   0   0   0   0   1   0 1005 3068 5065   
0  0 100
0 0 0 2468180  516561   0   0   0   0   0   0   0 1002 3090 5054   
0  1 99
0 0 0 2468180  516560   0   0   0   0   0   0   0 1002 3064 5053   
0  0 100


*loads* more context switches than on the BETA-3 system. I have not  
yet tried this during load; I have to wait for the testing window for  
that. But perhaps this helps? What do I look for next?



 - Any unusual messages in /var/log/messages?  Any errors
   shown by the network interfaces (I'm assuming the
   application is using the network).


No errors shown that I can determine.


 - A brief description of the workload presented by
   the app would help.


This is a web application (payment gateway) that receives a HTTP  
POST, does some processing, asks an external service for a piece of  
information, then returns the gathered information to the client. The  
call to the external service can be eliminated, but does not change  
the performance profile.
How the application works internally is impossible for me to say;  
it's 3rd party. I can say, after asking them, that it is "moderately"  
threaded. Whatever "moderately" threaded. My interpretation is that  
the heaviest threading happens in tomcat itself, with up to 150  
concurrent connection threads running.


Thanks,
/Eirik



--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-28 Thread Eirik Øverby



On Nov 28, 2005, at 15:54 , Joseph Koshy wrote:


EØ> *loads* more context switches than on the BETA-3 system.
EØ> I have not yet tried this during load

 - Which scheduler have you configured (BSD or ULE)?


Running GENERIC/SMP kernels, with BSD scheduler.
Speaking of which; is there a way to extract the kernel configuration  
from a running kernel or kernel binary?



 - What do the interrupt statistics show?  Any interrupt
   storms?  Please check the mailing lists for a prior
   discussion on interrupt storms on some motherboards.


Slow system:
interrupt  total   rate
irq1: atkbd0   4  0
irq14: ata0   46  0
irq24: ciss0  337166  1
irq28: bge0  8038794 35
cpu0: timer446869052   1999
cpu1: timer446861051   1999
Total  902106113   4037

Fast system:
interrupt  total   rate
irq1: atkbd0   6  0
irq14: ata0   46  0
irq24: ciss0 7465831  1
irq28: bge0 20764380  2
lapic0: timer14827978729   2000
lapic1: timer14827970729   2000
Total29684179721   4003

No significant differences I'd say. Anything else I can do to dig  
deeper?



 - Could you post the dmesg output from the systems (I
   presume there aren't any significant differences).


dmesg from slow system follows. I do not have a dmesg for the fast  
system; I cannot boot it now either. However, I have compared them  
before, and they are 100% equal. Seems to be very close in serial  
numbers, probably same production run.


Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights  
reserved.

FreeBSD 6.0-STABLE #0: Sat Nov 26 01:52:00 CET 2005
[EMAIL PROTECTED]:/usr/obj/amd64/usr/src/sys/SMP
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Opteron(tm) Processor 250 (2405.47-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20f51  Stepping = 1
   
Features=0x78bfbffMCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>

  Features2=0x1
  AMD Features=0xe2500800,LM,3DNow+,3DNow>
real memory  = 1073717248 (1023 MB)
avail memory = 1024946176 (977 MB)
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  1
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-27 on motherboard
ioapic2  irqs 28-31 on motherboard
ioapic3  irqs 32-35 on motherboard
ioapic4  irqs 36-39 on motherboard
acpi0:  on motherboard
acpi0: Power Button (fixed)
pci_link0:  irq 5 on acpi0
pci_link1:  irq 7 on acpi0
pci_link2:  irq 0 on acpi0
pci_link3:  irq 3 on acpi0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0
cpu0:  on acpi0
cpu1:  on acpi0
pcib0:  on acpi0
pci0:  on pcib0
pcib1:  at device 3.0 on pci0
pci1:  on pcib1
ohci0:  mem 0xf7df-0xf7df0fff irq  
19 at device 0.0 on pci1

ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0:  on ohci0
usb0: USB revision 1.0
uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1:  mem 0xf7de-0xf7de0fff irq  
19 at device 0.1 on pci1

ohci1: [GIANT-LOCKED]
usb1: OHCI version 1.0, legacy support
usb1: SMM does not respond, resetting
usb1:  on ohci1
usb1: USB revision 1.0
uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
pci1:  at device 2.0 (no driver attached)
pci1:  at device 2.2 (no driver attached)
pci1:  at device 3.0 (no driver attached)
isab0:  at device 4.0 on pci0
isa0:  on isab0
atapci0:  port  
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x2000-0x200f at device 4.1 on pci0

ata0:  on atapci0
ata1:  on atapci0
pci0:  at device 4.3 (no driver attached)
pcib2:  at device 7.0 on pci0
pci2:  on pcib2
ciss0:  port 0x5000-0x50ff mem  
0xf7ef-0xf7ef1fff,0xf7e8-0xf7eb irq 24 at device 4.0 on pci2

ciss0: [GIANT-LOCKED]
pci0:  at device 7.1 (no  
driver attached)

pcib3:  at device 8.0 on pci0
pci3:  on pcib3
bge0:  mem  
0xf7ff-0xf7ff irq 28 at device 6.0 on pci3

miibus0:  on bge0
brgphy0:  on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,  
1000baseTX-FDX, auto

bge0: Ethernet address: 00:13:21:b3:c1:f8
bge1:  mem  
0xf7fe-0xf7fe irq 29 at device 6.1 on pci3

miibus1:  on bge1
brgphy1:  on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,  
1000baseTX-FDX, auto

bge1: Ethernet address: 00:13:21:b3:c1:f7
pci0:  at device 8.1 (no  
driver attac

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-28 Thread Eirik Øverby


Follow-up:
I've now ran vmstat during load, which confirms the findings of  
vmstat during idle time.


Slow system - one sample before and after load start included:
procs  memory  pagedisks faults  cpu
r b w avmfre  flt  re  pi  po  fr  sr da0 pa0   in   sy  cs  
us sy id
3 0 0 2468572  45476   14   0   0   0  18   4   0   0 1049 3201 5132   
0  0 100
0 0 1 2468572  423881   0   0   0 154   0   5   0 6852 19813  
19970 22  8 70
1 0 0 2468572  393321   0   0   0 155   0  11   0 6823 19661  
19886 23  7 71
2 0 0 2468432  363361   0   0   0 160   0   6   0 7031 20356  
20534 19  7 74
0 0 0 2468432  332281   0   0   0 156   0   5   0 6685 19420  
19613 20  7 73
2 0 0 2468432  299281   0   0   0 164   0   5   0 7105 20483  
20673 21  7 71
1 0 0 2468432  535681   0   0   0 153 1308   5   0 6688 19278  
19537 21  8 72
1 0 1 2468432  505802   0   0   0 150   0   6   0 6408 18430  
18693 24  7 69
0 0 0 2468432  477482   0   0   0 143   0   6   0 6323 18098  
18328 26  7 67
0 0 0 2468432  450561   0   0   0 136   0   5   0 5607 17122  
17062 16  7 77
0 0 0 2468432  450400   0   0   0   0   0   0   0 1093 3172 5164   
0  0 100


Fast system:
procs  memory  pagedisks faults  cpu
r b w avmfre  flt  re  pi  po  fr  sr da0 pa0   in   sy  cs  
us sy id
0 0 0 2439276  397081   0   0   0   6   0   1   0  281 1029 992   
6  1 93
0 0 0 2439276  393807   0   0   0  16   0   1   0  665 1341 1714   
2  1 98
0 0 0 2439276  364725   0   0   0 145   0   6   0 5569 12409  
14821 21  7 72
0 0 0 2439276  335121   0   0   0 149   0   5   0 5862 12597  
15532 15  6 79
0 0 0 2439276  306001   0   0   0 146   0   4   0 5682 12655  
15102 19  7 74
2 0 0 2439276  541441   0   0   5 152 1310  10   0 6006 12908  
15964 17  6 77
0 0 0 2439276  511762   0   0   0 151   0   7   0 5348 11899  
14190 22  6 72
2 0 0 2439276  48104   98   0   0   0 248   0   5   0 5924 12889  
15757 15  7 78
1 0 0 2439276  451721   0   0   0 147   0   5   0 5882 12660  
15624 16  7 77
2 0 0 2439276  422761   0   0   0 145   0   5   0 5558 12477  
14864 21  6 73
0 0 0 2439276  393001   0   0   0 149   0   5   0 5842 12660  
15556 14  7 79
0 0 0 2439276  363481   0   0   0 150   0   8   0 5659 12562  
15042 21  5 74
0 0 0 2439276  334041   0   0   0 150   0   7   0 5868 12642  
15536 14  6 80
0 0 0 2439276  305881   0   0   0 142   0   6   0 5449 11961  
14487 19  7 74
0 0 0 2439276  305880   0   0   0   0   0   0   0  227  246 565   
0  0 100


I'm tempted to upgrade the fast system to 6-STABLE (same rev as the  
slow one). Even the slow system performs "adequately", though it  
might help me isolate any potential hardware differences.


/Eirik

On Nov 28, 2005, at 15:54 , Joseph Koshy wrote:


EØ> *loads* more context switches than on the BETA-3 system.
EØ> I have not yet tried this during load

 - Which scheduler have you configured (BSD or ULE)?
 - What do the interrupt statistics show?  Any interrupt
   storms?  Please check the mailing lists for a prior
   discussion on interrupt storms on some motherboards.
 - Could you post the dmesg output from the systems (I
   presume there aren't any significant differences).

Please CC -stable too.

--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-28 Thread Eirik Øverby


Hi,

I think I have found the culprit. There must be some sort of  
difference between the machines after all (BIOS revision?), because  
while on one machine the interrupt rate for the bge card stays very  
low (2 to be exact) during maximum load, the other machine goes  
beyond 1000 and keeps rising constantly. This might also explain why  
performance slowly degrades over time on that machine, and response  
times vary wildly, while the "fast" machine responds nicely within  
1-2 seconds no matter the load and testing time.


I will have to investigate this more closely. Is there a way to force  
the NIC to polling mode (I'm assuming that is the difference, an IRQ  
rate of 2 is too low for a heavily loaded server if the NIC is  
interrupt-driven)?


Anything else I could look at?

Also, the interrupt rates for the CPUs stay at 2000 sharp on the fast  
system, but fluctuates somewhat on the other.


/Eirik

On Nov 28, 2005, at 15:54 , Joseph Koshy wrote:


EØ> *loads* more context switches than on the BETA-3 system.
EØ> I have not yet tried this during load

 - Which scheduler have you configured (BSD or ULE)?
 - What do the interrupt statistics show?  Any interrupt
   storms?  Please check the mailing lists for a prior
   discussion on interrupt storms on some motherboards.
 - Could you post the dmesg output from the systems (I
   presume there aren't any significant differences).

Please CC -stable too.

--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-28 Thread Eirik Øverby


Firmware versions are equal. BIOS settings are equal.
However, a diff of the dmesgs show (apart from MAC address differences):

30c30
< Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
---
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000

What on earth is that all about? The "slow" box has the ACPI-fast  
timecounter...


/Eirik

On Nov 28, 2005, at 22:14 , Kris Kennaway wrote:


On Mon, Nov 28, 2005 at 09:54:30PM +0100, Eirik ?verby wrote:

Hi,

I think I have found the culprit. There must be some sort of
difference between the machines after all (BIOS revision?), because
while on one machine the interrupt rate for the bge card stays very
low (2 to be exact) during maximum load, the other machine goes
beyond 1000 and keeps rising constantly. This might also explain why
performance slowly degrades over time on that machine, and response
times vary wildly, while the "fast" machine responds nicely within
1-2 seconds no matter the load and testing time.

I will have to investigate this more closely. Is there a way to force
the NIC to polling mode (I'm assuming that is the difference, an IRQ
rate of 2 is too low for a heavily loaded server if the NIC is
interrupt-driven)?

Anything else I could look at?


BIOS update.

Kris


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-28 Thread Eirik Øverby

Update: The diff below was made after making sure both systems are  
running the exact same kernel. Behavior is the same. Building new  
kernels (6-STABLE) now to get out of the BETA stage.


/Eirik

On Nov 28, 2005, at 22:53 , Eirik Øverby wrote:


Firmware versions are equal. BIOS settings are equal.
However, a diff of the dmesgs show (apart from MAC address  
differences):


30c30
< Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
---
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000

What on earth is that all about? The "slow" box has the ACPI-fast  
timecounter...


/Eirik

On Nov 28, 2005, at 22:14 , Kris Kennaway wrote:


On Mon, Nov 28, 2005 at 09:54:30PM +0100, Eirik ?verby wrote:

Hi,

I think I have found the culprit. There must be some sort of
difference between the machines after all (BIOS revision?), because
while on one machine the interrupt rate for the bge card stays very
low (2 to be exact) during maximum load, the other machine goes
beyond 1000 and keeps rising constantly. This might also explain why
performance slowly degrades over time on that machine, and response
times vary wildly, while the "fast" machine responds nicely within
1-2 seconds no matter the load and testing time.

I will have to investigate this more closely. Is there a way to  
force

the NIC to polling mode (I'm assuming that is the difference, an IRQ
rate of 2 is too low for a heavily loaded server if the NIC is
interrupt-driven)?

Anything else I could look at?


BIOS update.

Kris


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-29 Thread Eirik Øverby



On Nov 29, 2005, at 10:15 , Kris Kennaway wrote:


On Tue, Nov 29, 2005 at 09:46:09AM +0100, Eirik Oeverby wrote:



On Mon, 28 Nov 2005, Kris Kennaway wrote:


On Mon, Nov 28, 2005 at 10:53:00PM +0100, Eirik ?verby wrote:

Firmware versions are equal. BIOS settings are equal.
However, a diff of the dmesgs show (apart from MAC address  
differences):


30c30
< Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
---

Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000


What on earth is that all about? The "slow" box has the ACPI-fast
timecounter...


Could be ACPI bugs on your system:


Yes, but the other system is 100% equal - hardware, bios config,  
bios and

bootblock revision, controller bioses, etc. etc.
It all matches.


Clearly they're not 100% equal, but (100-epsilon)%.  Your job is to
identify the origin of the epsilon :-)


Yea yea ;) Working on it..
Is there a way to force ACPI-safe on the slower system?

/Eirik




Should I complain to HP?


If you think you'll get anywhere, it might be worth pursuing.

Kris


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-29 Thread Eirik Øverby


On Nov 29, 2005, at 10:44 , Joseph Koshy wrote:


EØ> Yea yea ;) Working on it..
EØ> Is there a way to force ACPI-safe on the slower system?

# sysctl kern.timecounter.hardware=


kern.timecounter.choice: TSC(-100) ACPI-fast(1000) i8254(0) dummy 
(-100)


ACPI-safe is not among the choices. Which means I can't choose it, I  
presume.
I'm compiling up new kernels with ACPI_DEBUG right now, once they are  
installed, what can I do to determine differences in DSDT tables  
etc.? Or whatever else is different?


/Eirik



--
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-11-29 Thread Eirik Øverby



On Nov 29, 2005, at 11:37 , Kris Kennaway wrote:


On Tue, Nov 29, 2005 at 10:25:07AM +0100, Eirik ?verby wrote:


On Nov 29, 2005, at 10:15 , Kris Kennaway wrote:


On Tue, Nov 29, 2005 at 09:46:09AM +0100, Eirik Oeverby wrote:



On Mon, 28 Nov 2005, Kris Kennaway wrote:


On Mon, Nov 28, 2005 at 10:53:00PM +0100, Eirik ?verby wrote:

Firmware versions are equal. BIOS settings are equal.
However, a diff of the dmesgs show (apart from MAC address
differences):

30c30
< Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
---

Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000


What on earth is that all about? The "slow" box has the ACPI-fast
timecounter...


Could be ACPI bugs on your system:


Yes, but the other system is 100% equal - hardware, bios config,
bios and
bootblock revision, controller bioses, etc. etc.
It all matches.


Clearly they're not 100% equal, but (100-epsilon)%.  Your job is to
identify the origin of the epsilon :-)


Yea yea ;) Working on it..
Is there a way to force ACPI-safe on the slower system?


I think someone already mentioned this..see the
kern.timecounter.hardware and other kern.timecounter sysctls.


I have now forced ACPI-safe on the slow system, to match the fast one.
Too bad though, it made absolutely zero difference.

I'm upgrading BIOSes on both boxes now, even though they seem equal.  
Then I'll see what ACPI debug output shows me. If you have any other  
hints or ideas, please let me know...  thanks so far.


/Eirik



Kris


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Reduced java/tomcat performance 6-beta3 -> 6-stable ?

2005-12-01 Thread Eirik Øverby


On Dec 1, 2005, at 04:12 , Michael Vince wrote:

Some apps that use of frequent queries of the system time for  
example MySQL are well known in FreeBSD to be slower then Linux  
because its  more expensive to call compared to Linux, maybe Tomcat  
is also another such app this can also be double the case depending  
on on your jsp and servlet code.


True, but on equal hardware it should perform equally.

If you are on good hardware, are using 6 and keep your systems time  
updated via ntp you might want to try changing from  
kern.timecounter.hardware: ACPI-fast to TSC(-100) and doing a  
benchmark this has already proven to increase performance of MySQL  
by a significantly amount.


I will try this, though it will not solve my original problem (and  
the subject is somewhat misleading now, as this seems to be  
independent of kernel revisions).


Also some new experimental low-precision time code has been added  
to current source tree to see how much performance increases can be  
gained, weirdly enough some people have argued against it for I  
guess a wide range of reasons such as they just have crap hardware  
and don't care about performance, don't like the extra maintenance  
of code or just like Red Hat fanatics having an easy way to bad  
mouth FreeBSD performance. I think most people would agree though  
that it has to be done, or have to choose to believe FreeBSD isn't  
about performance among other goals.


I will not join this discussion ;)

With 6 you can also use the new thr threading library, try your  
libmap.conf to libthr for testing, for example

[/usr/local/jdk1.4.2/]
libpthread.so.2 libthr.so.2
libpthread.so   libthr.so

I been doing some 'ab' testing libthr with Apache2 compiled for  
worker MPM and have some really interesting differences on server  
load, loads of about 40 for pthread and around 5 thr under certain  
tests with ab with the exact same test.


Too bad this causes jdk1.5.0-amd64 to crash...
Application startup times were significantly reduced, but only the  
times it actually managed to start without failing. Latest at the 2nd  
or 3rd transaction Java coredumps. :(


And as current load testing is done without Apache in between, this  
is moot..


/Eirik




Mike


Eirik Øverby wrote:

Update: The diff below was made after making sure both systems  
are  running the exact same kernel. Behavior is the same. Building  
new  kernels (6-STABLE) now to get out of the BETA stage.


/Eirik

On Nov 28, 2005, at 22:53 , Eirik Øverby wrote:


Firmware versions are equal. BIOS settings are equal.
However, a diff of the dmesgs show (apart from MAC address   
differences):


30c30
< Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
---
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000

What on earth is that all about? The "slow" box has the ACPI- 
fast  timecounter...


/Eirik

On Nov 28, 2005, at 22:14 , Kris Kennaway wrote:


On Mon, Nov 28, 2005 at 09:54:30PM +0100, Eirik ?verby wrote:


Hi,

I think I have found the culprit. There must be some sort of
difference between the machines after all (BIOS revision?),  
because
while on one machine the interrupt rate for the bge card stays  
very

low (2 to be exact) during maximum load, the other machine goes
beyond 1000 and keeps rising constantly. This might also  
explain why
performance slowly degrades over time on that machine, and  
response

times vary wildly, while the "fast" machine responds nicely within
1-2 seconds no matter the load and testing time.

I will have to investigate this more closely. Is there a way  
to  force
the NIC to polling mode (I'm assuming that is the difference,  
an IRQ

rate of 2 is too low for a heavily loaded server if the NIC is
interrupt-driven)?

Anything else I could look at?



BIOS update.

Kris











___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: ZERO_COPY_SOCKETS

2005-12-14 Thread Eirik Øverby



On Dec 6, 2005, at 03:20 , Joshua Coombs wrote:


#optionsZERO_COPY_SOCKETS

What's the status of this in 6.0-R and 6-stable?  The idea of  
avoiding memory copies when possible seems really appealing for my  
386, on which any little boost is significant. : )


Hoi,
let me know how you got 6.0 running on i386 .. It sounds like the  
perfect way to spend some of the holidays ;) I've heard it won't run  
on a plain i386 out-of-the-box, what did you do to convince it to run?


/Eirik



Joshua Coombs

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- 
[EMAIL PROTECTED]"






___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Panic on logout in serial console

2006-01-23 Thread Eirik Øverby


Hi all,

I'm pretty aggravated right now. At exactly the wrong moment my  
spinal reflexes kicked in and I logged out from my serial console  
session on an important server. BANG! Kernel panic. This has been  
reported numerous times before, so I won't bother giving you the  
specifics right now - had to boot it immediately to come back up  
anyway, so no time.


Anyway - I was told at some point that this would be fixed in 6.x -  
but backporting the fix to 5.x would be hard to impossible.


Guess what: It's still not fixed. And it's a really really (did I say  
really?) bad thing, at least as far as I'm concerned.


Any fixes in the pipeline here? I'd be happy to help testing any  
patches..


I'm mostly pissed with myself though - I knew about this and usually  
never log out. But when under pressure, habits kick in. :P


With very frustrated regards,
/Eirik
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Obscure errors in dmsg, system instability

2004-09-25 Thread Eirik Øverby

On 25. Sep 2004, at 22:29, Doug White wrote:
On Thu, 23 Sep 2004, [ISO-8859-1] Eirik Øverby wrote:
On 23. Sep 2004, at 04:15, Doug White wrote:
Is something sharing an interrupt with that device?
PCI bus errors are generally Bad News .. either some device or the
mobo is inroducing errors.
Well.. Yes, there is some interrupt sharing. Relevant parts of dmsg:
[EMAIL PROTECTED] ~$ dmesg | grep "irq 2"
IOAPIC #0 intpin 19 -> irq 2
uhci0:  port 0xd400-0xd41f 
irq
2 at device 4.2 on pci0
ahc0:  port 0xd000-0xd0ff mem
0xe200-0xe2000fff irq 2 at device 6.0 on pci0
amr0:  mem 0xe300-0xe300 irq 2 at device 
9.1
on pci0
Apparently one of these devices doesn't like getting an interrupt when
there's no data pending. It might be a FreeBSD driver bug, but being a
3-way share it'll make it hard to untangle.
In that case, building a kernel without the adaptec driver might 
actually resolve the problem for now?


I don't like the fact that the LSI and the Adaptec are sharing IRQs,
given that the LSI is the main system drive controller (which is why I
don't use the Adaptec at all - and it cannot be disabled in BIOS I
think!)...
I should perhaps try to reallocate some of the IRQs, but I don't 
really
have a clue how to do that, since I have no VGA in that box.. Ohwell, 
I
guess I just have to rip it open ;)
Yah .. rearrange the cards in the slots and see what you can convince 
it
to do.
That's the plan.
Thanks!!
/Eirik

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

NO_YP_LIBC breaks 4-STABLE buildworld

2004-10-09 Thread Eirik Øverby

PS: I've posted a similiar mail to @current, but not a dupe ;)
Hi!
For some time I've been wanting to use NO_YP_LIBC with buildworld for 
my jails, to enable NIS on the host system but keep the jails 
functioning.

I noticed back in August that a patch was submitted to make this work 
on then-CURRENT:
http://lists.freebsd.org/pipermail/freebsd-arch/2004-August/002550.html

Sadly, when compiling 4-STABLE (as of 45 minutes ago), buildworld gives 
me the following errors:

...snip snip...
===> libexec/mknetid
cc -O -pipe -march=pentiumpro-c /usr/src/libexec/mknetid/mknetid.c
cc -O -pipe -march=pentiumpro-c /usr/src/libexec/mknetid/hash.c
cc -O -pipe -march=pentiumpro-c 
/usr/src/libexec/mknetid/parse_group.c
gzip -cn /usr/src/libexec/mknetid/netid.5 > netid.5.gz
gzip -cn /usr/src/libexec/mknetid/mknetid.8 > mknetid.8.gz
cc -O -pipe -march=pentiumpro -o mknetid mknetid.o hash.o 
parse_group.o
mknetid.o: In function `main':
mknetid.o(.text+0xdc): undefined reference to `yp_get_default_domain'
*** Error code 1
1 error
*** Error code 2
1 error
*** Error code 2
1 error
*** Error code 2
1 error
*** Error code 2
1 error

This feature would be very useful, and it is sad to see that it has 
once been in the tree but that it does not work any longer.

Here's to hoping someone can look into it (Bjoern, are you reading 
this? ;)

Thanks,
/Eirik
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

84 matches

Mail list logo