Re: NFS client over udp

2011-02-21 Thread Gerrit Kühn
On Sat, 19 Feb 2011 08:56:45 -0800 (PST) Kirill Yelizarov
 wrote about Re: NFS client over udp:

KY> > > Is ZFS in use on the system which sees rising wired
KY> > > memory?

KY> > No, ufs only. 
KY> I found an old post stating there is a leak with nfs udp client over
KY> zfs:

Later on in that thread we found out that the leak has nothing to do with
zfs and is triggered just by using nfs over udp. I cannot remember if
there was a fix for that (Rick probably knows); at some point I just
turned off udp on the server side completely and switched all clients to
tcp to get my systems stable again.

___ mailing list
To unsubscribe, send any mail to ""

drives >2TB on mpt device

2011-04-04 Thread Gerrit Kühn
Hi all,

I have a freshly installed 8.2-REL with a SuperMicro AOC-USASLP-L8i
controller (LSI/MPT 1068E chipset). I have several of these controllers
working nicely in other systems.
However, this time I tried drives >2TB for the first time (Hitachi
Deskstar 3TB). It appears that the mpt device reports only 2TB in this
case. I have already flashed the controller's firmware to the latest
available version (from 2009), but that did not change anything. The drive
is working fine on the standard SATA connectors on the mainboard
(Supermicro H8DME-2) and reports 2.8TB there.
Are there any hints how to access the full drive? Am I seeing a limitation
of the controller/firmware or rather of the driver (mpt)?

___ mailing list
To unsubscribe, send any mail to ""

Re: drives >2TB on mpt device

2011-04-04 Thread Gerrit Kühn
On Mon, 4 Apr 2011 14:36:25 +0100 Bruce Cran  wrote
about Re: drives >2TB on mpt device:

Hi Bruce,

BC> It looks like a known issue:

Hm, I don't know if this is exactly what I'm seeing here (although the
cause may be the same):
I do not use mptutil. The controller is "dumb" (without actual raid
processor), and I intend to use it with zfs. However, I cannot even get
gpart to create a partition larger than 2TB, because mpt comes back with
only 2TB after probing the drive. As this is a problem that already exists
with 1 drive, I cannot use gstripe or zfs to get around this.
But the PR above states that this limitation is already built into mpt, so
my only chance is probably to try a different controller/driver (any
suggestions for a cheap 8port controller to use with zfs?), or to wait
until mpt is updated to support larger drives. Does anyone know if there
is already ongoing effort to do this?

___ mailing list
To unsubscribe, send any mail to ""

Re: drives >2TB on mpt device

2011-04-14 Thread Gerrit Kühn
On Mon, 4 Apr 2011 07:37:15 -0700 Artem Belevich  wrote
about Re: drives >2TB on mpt device:

AB> You're probably out of luck as far as 2Tb+ support for 1068-based HBAs:
AB> Newer controllers based on LSI2008 (mps driver?) should not have that
AB> limit.

For the record:
My latest info from Supermicro is that the chip would do above 2TB with
SAS drives, but doesn't do it with SATA...

However, I changed to a 3ware 9650se controller now. I had to flash a beta
firmware to get 2.8GB recognized on the drives, but now it seems to work.
Only the twa device/controller responds with a reset when trying to do "zdb
-C"... strange. But apart from that the drives seem to work fine, even
with zfs.

To sum up what I experienced during the last days: all "cheap" controllers
I tried (nvidia mcp55 onboard, SiI 3124) work fine out-of-the-box. All
expensive, scsi-like stuff (3ware 9650, lsi) needs at least firmware
updates or does not work (meaning shows either 2TB, 800GB or does not work
at all). For my "old" 3ware 9550 controllers there is not even a beta
firmware available to fix the problem. :-(

___ mailing list
To unsubscribe, send any mail to ""

diskless booting with 8.2 regression?

2011-07-18 Thread Gerrit Kühn
Hi all,

I just updated my nfs/tftp server for diskless booting from 8.0-rel to
8.2-stable. I have a bunch of Linux clients that used to work with the
8.0-setup, but fail to boot now.

On the server side I see

Jul 18 11:18:24 mclane tftpd[72434]: Got ERROR packet: TFTP Aborted

in the log/messages, but the Linux kernel appears to be transferred over
the net just fine (so this is probably not the real issue). It starts to
boot and fails at some later point (with no apparent error message on
screen) causing an endless reboot loop.
I already googled for quite some time on this now, but nothing useful
came up. The error message above seems to be harmless, at least the
machines of people reporting them work nevertheless.

Are there any known issues/regressions with tftp/nfs diskless booting? I
read in some posts that people were vaguely "having problems" with it when
updating to 8.2-something, but could not find any details. Are there any
further hints what I could do to narrow down the problem?

___ mailing list
To unsubscribe, send any mail to ""

Re: diskless booting with 8.2 regression?

2011-07-18 Thread Gerrit Kühn
On Mon, 18 Jul 2011 12:38:22 +0200 Gerrit Kühn
 wrote about diskless booting with 8.2

I guess I found the root of all evil now: device nodes on zfs!

This is how a linux /dev/console looked on the 8.0-FreeBSD server on

crw---  1 root  wheel5,   1 Oct 28  2010 

Now, after updating to FreeBSD-8.2 and zfsv28 it looks like this on the server:

crw-r--r--  1 root  wheel  255, 0x00ff Jul 18 16:33 

Strange enough, the Linux client still displays the correct values when using 
"ls -la", but it refuses to work properly.
I tried creating new device nodes from the client side with mknod and I tried 
getting correct ones from a backup, but they always end up being broken. Even 
movong the directories over to a ufs volume leaves them unusable:

crw-r--r--  1 root  wheel0,   0 Jul 18 16:33 /tmp/console

Luckily, I am back into business now with my machines, because moving the stuff 
from zfs to ufs and dropping in a correct version of /dev on the ufs side works 
just fine.

However, it would be great if this could be fixed, because I do not have many 
ufs partitions left these days...


GK> Hi all,
GK> I just updated my nfs/tftp server for diskless booting from 8.0-rel to
GK> 8.2-stable. I have a bunch of Linux clients that used to work with the
GK> 8.0-setup, but fail to boot now.
GK> On the server side I see
GK> Jul 18 11:18:24 mclane tftpd[72434]: Got ERROR packet: TFTP Aborted
GK> in the log/messages, but the Linux kernel appears to be transferred
GK> over the net just fine (so this is probably not the real issue). It
GK> starts to boot and fails at some later point (with no apparent error
GK> message on screen) causing an endless reboot loop.
GK> I already googled for quite some time on this now, but nothing useful
GK> came up. The error message above seems to be harmless, at least the
GK> machines of people reporting them work nevertheless.
GK> Are there any known issues/regressions with tftp/nfs diskless booting?
GK> I read in some posts that people were vaguely "having problems" with
GK> it when updating to 8.2-something, but could not find any details. Are
GK> there any further hints what I could do to narrow down the problem?
GK> cu
GK>   Gerrit
GK> ___
GK> mailing list
GK> To unsubscribe, send any mail to
GK> ""
___ mailing list
To unsubscribe, send any mail to ""

Fw: zfs snapshot: Bad file descriptor

2011-08-25 Thread Gerrit Kühn
Sorry for crossposting, but I got no answer at all from freebsd-fs. Anyone
in here having any ideas/suggestions on this?


Begin forwarded message:

Date: Tue, 23 Aug 2011 17:02:55 +0200
From: Gerrit Kühn 
Subject: zfs snapshot: Bad file descriptor

Hi all,

since upgrading some of my storage machines to recent 8.2-stable and
zfs-v28 I see the following on some filesystems after some time of

mclane# ll /tank/home/pt/.zfs
ls: snapshot: Bad file descriptor
total 0

I make quite heavy use of snapshots on all my machines and use rsync to
backup snapshots to other machines.
Googleing around I found several people reporting similar problems, but no
real solution (apart from rebooting, which is not really a thing you want
to do every time you run into this).
Is there any knowledge/ideas available over the list here how to improve
this situation? Am I just one of the few unlucky people who see this, or is
there an actual reason for this happening that could be fixed or

___ mailing list
To unsubscribe, send any mail to ""

___ mailing list
To unsubscribe, send any mail to ""

Regression 7.0R -> 7-stable?

2008-08-07 Thread Gerrit Kühn
Hi folks,

I have a rather new FujitsuSiemens Esprimo here with an AMD Phenom X3
processor (triple core is somehow strange :-) and a lot of NVidia stuff
onboard. I installed 7.0-R, which ran quite well except for the bge driver
and snd_hda which both complained.
After putting in an extra networking card I was able to install some more
software and all appeared to be nice. Then I cvsupped to the recent
7-stable as of today. My hope was that maybe the bge or the sound card
would improve from this. However, the new kernel I compiled does not run
at all. It boots up to CPU#1 and CPU#2 lauchned messages and then sits
there and does nothing anymore. I have verified this behaviour with amd64
snapshot images from July and August to make sure I did not compile a bad
kernel. Both show the same behaviour.
Are there any ideas what has changed from 7.0-R to recent 7.0-stable that
could cause this? What can I do to debug/fix this?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-08-07 Thread Gerrit Kühn
On Thu, 7 Aug 2008 05:19:40 -0700 Jeremy Chadwick <[EMAIL PROTECTED]>
wrote about Re: Regression 7.0R -> 7-stable?:

JC> I think you misread what he was saying.  :-)  He's saying that his
JC> system locks up hard after the kernel prints the "SMP: AP CPU #x
JC> Launched!" messages.

Exactly. ;-)

JC> I believe this is the 2nd or 3rd report we've had of this behaviour,
JC> re: system locking up hard after those messages.  I'll see if I can
JC> find the past reports; I'm going off of memory, but they were in
JC> recent days.

Please let me know if I can provide any further information. However, I
will probably not be on the net from tomorrow (Friday) until Monday.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-08-13 Thread Gerrit Kühn
On Thu, 07 Aug 2008 15:55:30 -0700 Xin LI <[EMAIL PROTECTED]> wrote
about Re: Regression 7.0R -> 7-stable?:

XL> Could you please try disabling ACPI and boot?  Additionally a 'boot -
XL> v' may reveal some useful information as well.  Just some random
XL> thoughts.

Disabling ACPI does not help, either (forgot to mention that earlier,
sorry). I will post a boot -v log as soon as I have a working 7.0R on the
system again.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-08-22 Thread Gerrit Kühn
On Thu, 07 Aug 2008 15:55:30 -0700 Xin LI <[EMAIL PROTECTED]> wrote
about Re: Regression 7.0R -> 7-stable?:

XL> Could you please try disabling ACPI and boot?  Additionally a 'boot -
XL> v' may reveal some useful information as well.  Just some random
XL> thoughts.

Sorry for being so late, I almost completely forgot that I had not sent
the verbose dmesg yet. Here it is:

Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-RELEASE-p3 #0: Fri Aug 15 10:29:37 CEST 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
Preloaded elf kernel "/boot/kernel/kernel" at 0x80c09000.
Preloaded elf obj module "/boot/kernel/if_em.ko" at 0x80c09220.
module_register: module pci/em already exists!
Module pci/em failed to register: 17
Calibrating clock(s) ... i8254 clock: 1193195 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter "i8254" frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 2100014603 Hz
CPU: AMD Phenom(tm) 8400 Triple-Core Processor (2100.01-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f22  Stepping = 2
  AMD Features=0xee500800,RDTSCP,LM,3DNow!
+,3DNow!> AMD
Cores per package: 3 L1 2MB data TLB: 48 entries, fully associative
L1 2MB instruction TLB: 16 entries, fully associative
L1 4KB data TLB: 48 entries, fully associative
L1 4KB instruction TLB: 32 entries, fully associative
L1 data cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way associative
L1 instruction cache: 64 kbytes, 64 bytes/line, 1 lines/tag, 2-way
associative L2 2MB data TLB: 128 entries, 2-way associative
L2 2MB instruction TLB: 0 entries, 2-way associative
L2 4KB data TLB: 512 entries, 4-way associative
L2 4KB instruction TLB: 512 entries, 4-way associative
L2 unified cache: 512 kbytes, 64 bytes/line, 1 lines/tag, 16-way
associative usable memory = 4281212928 (4082 MB)
Physical memory chunk(s):
0x1000 - 0x00099fff, 626688 bytes (153 pages)
0x00d07000 - 0xcf43afff, 3463659520 bytes (845620 pages)
0x0001 - 0x000127fe, 671023104 bytes (163824 pages)
avail memory  = 4123783168 (3932 MB)
INTR: Adding local APIC 1 as a target
INTR: Adding local APIC 2 as a target
FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
ACPI: RSDP @ 0x0xf75d0/0x0024 (v  2 PTLTD )
ACPI: XSDT @ 0x0xd7f58f38/0x007C (v  1 FSCPC   0x0006  LTP
0x) ACPI: FACP @ 0x0xd7f60600/0x00F4 (v  3 FSC
0x0006 PTL_ 0x000F4240) ACPI: DSDT @ 0x0xd7f58fb4/0x75D8 (v  1 FSC
D2721/A1 0x0006 MSFT 0x0301) ACPI: FACS @ 0x0xd7f61fc0/0x0040
ACPI: TCPA @ 0x0xd7f606f4/0x0032 (v  1 Phoeni  x   0x0006  TL
0x) ACPI: WDAT @ 0x0xd7f60726/0x0104 (v  1 PTLTD  WDATTBL
0x0006  LTP 0x0001) ACPI: SSDT @ 0x0xd7f6082a/0x03FC (v  1 AMD
POWERNOW 0x0006 AMD  0x0001) ACPI: SRAT @ 0x0xd7f60c26/0x00D8 (v
1 AMDHAMMER   0x0006 AMD  0x0001) ACPI: ASF! @
0x0xd7f60cfe/0x0070 (v 32 OEMID  OEMTBL   0x0006 PTL  0x0001)
ACPI: SLIC @ 0x0xd7f60d6e/0x0176 (v  1 FSCPC   0x0006  LTP
0x) ACPI: MCFG @ 0x0xd7f60ee4/0x003C (v  1 PTLTDMCFG
0x0006  LTP 0x) ACPI: HPET @ 0x0xd7f60f20/0x0038 (v  1 PTLTD
HPETTBL  0x0006  LTP 0x0001) ACPI: APIC @ 0x0xd7f60f58/0x0080 (v
1 PTLTD  APIC   0x0006  LTP 0x) ACPI: BOOT @
0x0xd7f60fd8/0x0028 (v  1 PTLTD  $SBFTBL$ 0x0006  LTP 0x0001)
MADT: Found IO APIC ID 3, Interrupt 0 at 0xfec0 ioapic0: Routing
external 8259A's -> intpin 0 lapic0: Routing NMI -> LINT1 lapic0: LINT1
trigger: edge lapic0: LINT1 polarity: high
lapic1: Routing NMI -> LINT1
lapic1: LINT1 trigger: edge
lapic1: LINT1 polarity: high
lapic2: Routing NMI -> LINT1
lapic2: LINT1 trigger: edge
lapic2: LINT1 polarity: high
MADT: Interrupt override: source 0, irq 2
ioapic0: Routing IRQ 0 -> intpin 2
MADT: Interrupt override: source 9, irq 9
ioapic0: intpin 9 trigger: level
ioapic0: intpin 9 polarity: low
MADT: Interrupt override: source 0, irq 2
ioapic0: Routing IRQ 0 -> intpin 2
ioapic0  irqs 0-23 on motherboard
cpu0 BSP:
 ID: 0x   VER: 0x80050010 LDR: 0x DFR: 0x
  lint0: 0x00010700 lint1: 0x0400 TPR: 0x SVR: 0x01ff
  timer: 0x000100ef therm: 0x0001 err: 0x0001 pcm: 0x0001
ath_rate: version 1.2 
wlan: <802.11 Link Layer>
nfslock: pseudo-device
kbd: new array size 4
kbd1 at kbdmux0
ath_hal: (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
hptrr: HPT RocketRAID controller driver v1.1 (Aug 15 2008 10:29

Re: Regression 7.0R -> 7-stable?

2008-08-25 Thread Gerrit Kühn
On Sat, 23 Aug 2008 07:50:09 -0400 John Baldwin <[EMAIL PROTECTED]> wrote
about Re: Regression 7.0R -> 7-stable?:

JB> > XL> Could you please try disabling ACPI and boot?  Additionally a
JB> > XL> 'boot - v' may reveal some useful information as well.  Just
JB> > XL> some random thoughts.

JB> > Disabling ACPI does not help, either (forgot to mention that
JB> > earlier, sorry). I will post a boot -v log as soon as I have a
JB> > working 7.0R on the system again.

JB> What about disabling apic?  Also, are you using a serial console?  If
JB> so, can you put DDB in your kernel and when it hangs break into ddb
JB> and do a 'ps' and capture the output.

apic, acpi, don't confuse me. :-)
I will have to look up what I disabled last time (I think it was acpi,
though). Or do you mean you would like to have a boot -v dmesg without
apic, too? 
I do not use a serial console up to now, but probably I could learn how to
do that (fortunately the machine still has a serial port).

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-08-27 Thread Gerrit Kühn
On Mon, 25 Aug 2008 00:53:32 -0700 Jeremy Chadwick <[EMAIL PROTECTED]>
wrote about Re: Regression 7.0R -> 7-stable?:

JC> > JB> What about disabling apic?  Also, are you using a serial
JC> > JB> console?  If so, can you put DDB in your kernel and when it
JC> > JB> hangs break into ddb and do a 'ps' and capture the output.

JC> > apic, acpi, don't confuse me. :-)

Ok, here some more input from my side:
I updated to the latest 7-stable, which identifies as 7.1-prerelease.
Booting default still locks up after launching the additional cores.
Booting without acpi behaves a bit different from before: It finds the
onboard sata-controller, but does not find the hd anymore. Thus it breaks
into the "Manual root filesystem specification" menu (but there is
nothing to specifiy as the disc is not found at all). The same happens when
I set hint.apic.0.disabled="1" at the loader prompt.

If someone can tell me what to do (except for putting ddb into the kernel
configuration) or point me at some documentation about this, I can try
getting some useful information from the debugger.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-10-07 Thread Gerrit Kühn
On Tue, 7 Oct 2008 10:02:56 -0400 John Baldwin <[EMAIL PROTECTED]> wrote
about Re: Regression 7.0R -> 7-stable?:

JB> Do you have more details about the crash?  Are you getting an actual
JB> panic with messages on the console, or are you still seeing hangs?

Like it was before: system just hangs after displaying the probing
messages about the CPU cores; next step for a working kernel would be
mounting of the file systems (and changing from white kernel output to grey
system output).

JB> When you get a hang, can you break into the debugger and get a crash
JB> dump?

Is there a documentation somewhere how to do this?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-10-07 Thread Gerrit Kühn
On Tue, 7 Oct 2008 07:25:42 -0700 Jeremy Chadwick <[EMAIL PROTECTED]>
wrote about Re: Regression 7.0R -> 7-stable?:

JC> > Like it was before: system just hangs after displaying the probing
JC> > messages about the CPU cores; next step for a working kernel would
JC> > be mounting of the file systems (and changing from white kernel
JC> > output to grey system output).

JC> Actually, I think you mean mounting of the root filesystem, do you not?
JC> If so, others have recently reported this problem (hard lock-ups
JC> before or after printing "Mounting root from...").

Yes, of course (sorry for being vague).

JC> > JB> When you get a hang, can you break into the debugger and get a
JC> > JB> crash dump?

JC> > Is there a documentation somewhere how to do this?

JC> John can probably help you with the commands you need to type, but the
JC> FreeBSD Handbook goes over the general commands.
JC> As far as getting into the debugger, it's Control-Alt-Esc from the
JC> console.

Ok, I added options KDB and DDB to my kernel configuration and compiled
with SCHED_ULE. However, after hanging the system does not react on
Ctrl-Alt-Esc. Am I missing something?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-10-07 Thread Gerrit Kühn
On Wed, 27 Aug 2008 11:16:29 +0200 Gerrit Kühn
<[EMAIL PROTECTED]> wrote about Re: Regression 7.0R -> 7-stable?:


GK> If someone can tell me what to do (except for putting ddb into the
GK> kernel configuration) or point me at some documentation about this, I
GK> can try getting some useful information from the debugger.

Sorry to disturb all of you again, but the thing is still not fixed (or
better: broken again) for me:

I saw some patches referring to this problem and was able to compile a
working kernel somewhen in September (don't know the exact date
unfortunatley, but the kernel is from 22nd of September, so this is the
latest possible date). After that I thought the issue had settled.

However, yesterday I upgraded the system to a recent 7-stable codebase,
and now it locks again hard after probing the CPU cores. My setup remained
exactly the same, the new code just does not work.
Please let me know what I have do to provide further information.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-10-07 Thread Gerrit Kühn
On Tue, 7 Oct 2008 13:38:01 +0200 Gerrit Kühn <[EMAIL PROTECTED]>
wrote about Re: Regression 7.0R -> 7-stable?:


GK> However, yesterday I upgraded the system to a recent 7-stable
GK> codebase, and now it locks again hard after probing the CPU cores. My
GK> setup remained exactly the same, the new code just does not work.
GK> Please let me know what I have do to provide further information.

Well, comparing the kernel setups again I found one thing that makes a
difference: The scheduler (somewhen SCHED_ULE has been declared the
SCHED_ULE is crashing the system, SCHED_4BSD works fine...

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-10-10 Thread Gerrit Kühn
On Tue, 7 Oct 2008 12:05:35 -0400 John Baldwin <[EMAIL PROTECTED]> wrote
about Re: Regression 7.0R -> 7-stable?:

JB> > Ok, I added options KDB and DDB to my kernel configuration and
JB> > compiled with SCHED_ULE. However, after hanging the system does not
JB> > react on Ctrl-Alt-Esc. Am I missing something?

JB> Can you add VERBOSE_SYSINIT to your kernel config and do a boot -v?

Ok, I just did that. Now I get some more info after "SMP: AP CPU #2
Launched!" (omissions by me):

cpu2 AP:
 [list with ID, lint0, timer etc. like für CPU0 and cpu1]
0x[omitted]... ioapic0: Assigning ISA IRQ 1 to local APIC 0
ioapic0: Assigning ISA IRQ 9 to local APIC1
ioapic0: Assigning ISA IRQ 11 to local APIC2
ioapic0: Assigning ISA IRQ 14 to local APIC0
ioapic0: Assigning ISA IRQ 15 to local APIC1
ioapic0: Assigning PCI IRQ 17 to local APIC2
ioapic0: Assigning PCI IRQ 18 to local APIC0
ioapic0: Assigning PCI IRQ 19 to local APIC1
ioapic0: Assigning PCI IRQ 20 to local APIC2
  0x[omitted]... done.
  0x[omitted]... done.
subsystem fff
  0x804608c0(0)... x

Where I put the "x" I see a grey inverted cursor and the system hangs. I
cannot break into the debugger.

JB> Also, are you able to log the output at all (such as via a serial
JB> console)?

Well, I need to get a serial cable, notebook and stuff for that. As I am
on holiday for the next two weeks I will probably not find time to do that

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-10-13 Thread Gerrit Kühn
On Fri, 10 Oct 2008 11:22:15 -0400 John Baldwin <[EMAIL PROTECTED]> wrote
about Re: Regression 7.0R -> 7-stable?:

JB> Ok, can you run gdb on your kernel.debug and do
JB> 'l *0x804608c0'

0x804608c0 is in scheduler (/usr/src/sys/vm/vm_glue.c:670).

[...lines 665-674...]

Hope this helps,
___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-10-14 Thread Gerrit Kühn
On Mon, 13 Oct 2008 10:27:40 -0400 John Baldwin <[EMAIL PROTECTED]> wrote
about Re: Regression 7.0R -> 7-stable?:

JB> On Monday 13 October 2008 03:09:46 am Gerrit Kühn wrote:

JB> > JB> Ok, can you run gdb on your kernel.debug and do
JB> > JB> 'l *0x804608c0'

JB> > 0x804608c0 is in scheduler (/usr/src/sys/vm/vm_glue.c:670).
JB> > [...lines 665-674...]

JB> I was afraid of that, it basically means that it finished the entire
JB> boot process.  

I already thought so because I saw a grey (not white) cursor afterwards.

JB> The next step is that init (pid 1) should be scheduled
JB> and try to execute.  You can maybe add some printf's to the code to
JB> start up init to see how far it gets.  The routine in question is
JB> 'start_init()' in sys/kern/init_main.c.

Let me see...
I added my first printf in line 619 (and several after that), right after
the "Need just enough stack..." comment. This was never reached, the
system hangs before that.

After that I added printf before and after vfs_mountroot(). Now the things
runs just a bit further for the first time. I see my new printfs and
between them the message "Trying to mount root from ufs:/dev/ad0s1a".
After that come all my printfs I had added before, followed by
"start_init: trying /sbin/init". Then it hangs again.

I am a bit puzzled because I did not see the "Trying to mount..." and
"start_init:..." messages before. Just trying again to boot with the
same setup hangs in vfs_mountroot() (printf before is displayed, printf
after not). It appears to me as if the hang is caused by some kind of
"parallel task", and what I am seeing on the console stops a bit earlier
or later depending on that.
As I am seeing this only with the ULE-scheduler: Is the scheduler already
in action at this point, and may the hang depend on what it is deciding
to do?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Regression 7.0R -> 7-stable?

2008-10-31 Thread Gerrit Kühn
On Tue, 14 Oct 2008 13:12:04 -0400 John Baldwin <[EMAIL PROTECTED]> wrote
about Re: Regression 7.0R -> 7-stable?:


I'm back. Sorry for taking so long with the answer, but I had some
holidays and needed to catch up with lots of other things first.

JB> Are you sure you aren't using dual consoles somehow with serial being
JB> primary? If you break into the loader, what does 'show console' show?

Just "vidconsole".

JB> > As I am seeing this only with the ULE-scheduler: Is the scheduler
JB> > already in action at this point, and may the hang depend on what it
JB> > is deciding to do?

JB> Hmmm, I'm really not sure.  I wonder if you are having some sort of
JB> interrupt storm.  What if you disable SMP via 'kern.smp.disabled=1' in
JB> the loader, does that help at all?

Yes, boots up fine with that setting (but naturally running on only one

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Curious failure of ZFS snapshots

2008-11-21 Thread Gerrit Kühn
On Fri, 21 Nov 2008 13:39:20 + Pete French
<[EMAIL PROTECTED]> wrote about Curious failure of ZFS snapshots:

PF> On the box with the snapshots being created every day with the same
PF> name I quickly end up with unavailable snapshots, and the error
PF> message: 'Bad file descriptor'. On the machine which is creating
PF> dailys which do not have the same name this does not happen.
PF> Interesting - and unexpected. The machines are running identical
PF> kernels, being 7-STABLE form a few days ago.

I have a similar setup (creating daily.0 every night and rotating the
rest) for some home directories here...
Right now 3 of them are fine, and one is showing the same problem you

mclane# ll /tank/home/pt/.zfs/
ls: snapshot: Bad file descriptor
total 0

Note that zfs still thinks the snapshots are there:

mclane# zfs list -r tank/home/pt
tank/home/pt454M  1.78T   262M  /tank/home/pt
tank/home/[EMAIL PROTECTED]  32.5M  -   260M  -
tank/home/[EMAIL PROTECTED]   29.6M  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]   45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]   28.2M  -   262M  -
tank/home/[EMAIL PROTECTED]   28.7M  -   262M  -
tank/home/[EMAIL PROTECTED]   27.8M  -   262M  -
tank/home/[EMAIL PROTECTED]374K  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]   45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -
tank/home/[EMAIL PROTECTED]  45.3K  -   262M  -

I even do autoamtic incremental backups via send/receive. There are no
errors (up to today) and the snapshots are accessible on the backup

I am a bit troubled by unaccessible snapshots. Does anyone else here have
the same problem (or can even offer a solution)?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Curious failure of ZFS snapshots

2008-11-21 Thread Gerrit Kühn
On Fri, 21 Nov 2008 15:15:18 +0100 Gerrit Kühn
<[EMAIL PROTECTED]> wrote about Re: Curious failure of ZFS

GK> Right now 3 of them are fine, and one is showing the same problem you
GK> described:
GK> mclane# ll /tank/home/pt/.zfs/
GK> ls: snapshot: Bad file descriptor
GK> total 0


One addition: I just tried to unmount tank/home/pt (in the hope the after
a remount everything would be fine again). This paniced the kernel, even
the coredump did not finish. After rebooting everything came back fine,
though. Now the snapshots are accessible again.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Curious failure of ZFS snapshots

2008-11-21 Thread Gerrit Kühn
On Fri, 21 Nov 2008 17:02:07 +0200 Nikolay Denev <[EMAIL PROTECTED]> wrote
about Re: Curious failure of ZFS snapshots:

ND> I've experienced this problem in the past :

Yes, that looks quite the same (even the panic).

ND> But the machine I was having these issues are no longer operational
ND> so I can't test again.
ND> I hope the new ZFS import will fix this...

I'm a bit worried about moving from -stable to -current with this machine.
Does anyone know (Pawel? :-) when the patches will arrive in -stable, and
if this issue is expected to be fixed then?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Curious failure of ZFS snapshots

2008-11-24 Thread Gerrit Kühn
On Fri, 21 Nov 2008 08:16:35 -0800 Freddie Cash <[EMAIL PROTECTED]> wrote
about Re: Curious failure of ZFS snapshots:

FC> > GK> mclane# ll /tank/home/pt/.zfs/
FC> > GK> ls: snapshot: Bad file descriptor
FC> > GK> total 0

FC> Which shell are you using?  I've seen quite a few 
FC> different "non-existent"/"invalid directory" errors when using tcsh
FC> to navigate through the .zfs/ hierarchy.  Can do "cd ..", "ls .", or
FC> tab completion when in anything under .zfs/

Standard root login, so it's /bin/csh.
I cannot remember if I tried to cd into the dir, and after rebooting
everything's fine up to now. I will try this if I see the problem again.
However, it would be rather strange if this was shell-dependent, as all
other snapshots were happily accessible with csh (and the panic after
trying to unmount the fs is definitely not an expected behaviour
either :-). 

FC> Using sh or zsh, these errors don't occur.
FC> Just curious if this is the same kind of thing.

I will try it when I see the problem next time.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Curious failure of ZFS snapshots

2008-12-01 Thread Gerrit Kühn
On Sat, 29 Nov 2008 11:46:40 +0100 Pawel Jakub Dawidek <[EMAIL PROTECTED]>
wrote about Re: Curious failure of ZFS snapshots:

PJD> > > GK> mclane# ll /tank/home/pt/.zfs/
PJD> > > GK> ls: snapshot: Bad file descriptor
PJD> > > GK> total 0

PJD> Is there a way for me to reproduce that?

None that I could tell you right now.
This was on a machine which uses zfs send/receive to backup its zfs
filesystem to a backup server. Only one out of 6 or 7 zfs filesystems
showed this problem. After rebooting it went away and did not appear again
since then.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Curious failure of ZFS snapshots

2008-12-01 Thread Gerrit Kühn
On Sun, 30 Nov 2008 01:05:48 + Pete French
<[EMAIL PROTECTED]> wrote about Re: Curious failure of ZFS

PF> Here is what I am doing - this script is run with an argument '7am' or
PF> '7pm' once per day. the mysql database is a slave replication from a
PF> master, so there is a continuous trickle of data into it. The symbolic
PF> links are there so you can connect to the mysql server and access
PF> 'xxx-7am' or 'xxx-7pm' to get a previous version of database 'xxx'.
PF> In case its not obvious, the filesystem 'tank/zfs' is mounted on the
PF> director '/var/db/mysql'. If you run this for a few cycles it should
PF> preseumably break for you too.

If you think it will be useful I can also post my scripts. However, as I
did not see the problem again so far, it might be the case that I messed
something up manually while developing the scripts one or two weeks ago.
As mentioned, even the unaccessible zfs snapshots did send/receive fine,
so internally zfs seems to be happy (only unmounting them was a bad
idea :-).

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

fun with if_re

2009-02-04 Thread Gerrit Kühn
Hi folks,

I have several routers here which are based on Jetway J7F4 ITX boards that
come with two onboard re-interfaces. I run 7-stable on them via nanobsd
and update them about once in three or four months.

After the last update (11th December 2008) I have noticed the following
strange behaviour on at least two machines (identical hard- and software):
After weeks of flawless operation, the network connection on both
interfaces suddenly starts to mangle packages. Even a simple ping can show
up to 50% or so package loss. The machine is mostly unreachable via net.
ifconfig up/down did not cure this, turning off checksum-offloading
and stuff did not help. Even simply rebooting the machine did not make the
problem go away! I had to power-cycle them by unplugging all cables to get
back to normal operation.

I have seen this behaviour on two different machines, so I can most
probably rule out a hardware issue. It does not appear to happen often,
though. I did not see this with an earlier image of 7-stable from June
2008, and probably even an image from early September was working fine
(although I did not use that one for such a long time).

Visiting the webcvs I noticed that there are a lot of patches for if_re in
December 2008 and January 2009. The revision I'm having problems with is
tagged " 2008/12/09 11:01:17". Does anyone have an idea what
broke if_re for me, and how I can get back to stable operation? Is it
possible to use if_re from head as drop-in replacement to test the patches
available after 12/09? I would prefer not to move the machines completely
from -stable to -current.

Here some further information about the NICs:

r...@pci0:0:9:0: class=0x02 card=0x10ec16f3 chip=0x816710ec rev=0x10
hdr=0x00 vendor = 'Realtek Semiconductor'
device = 'RTL8169/8110 Family Gigabit Ethernet NIC'
class  = network
subclass   = ethernet
r...@pci0:0:11:0:class=0x02 card=0x10ec16f3 chip=0x816710ec
rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor'
device = 'RTL8169/8110 Family Gigabit Ethernet NIC'
class  = network
subclass   = ethernet

re0:  port
0xf000-0xf0ff mem 0xfdfff000-0xfdfff0ff irq 10 at device 9.0 on pci0 re0:
Chip rev. 0x1800 re0: MAC rev. 0x
miibus0:  on re0
rgephy0:  PHY 1 on miibus0
rgephy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto re0: Ethernet address: 00:30:18:ab:d0:19
re0: [FILTER]
re1:  port
0xf200-0xf2ff mem 0xfdffe000-0xfdffe0ff irq 10 at device 11.0 on pci0 re1:
Chip rev. 0x1800 re1: MAC rev. 0x
miibus1:  on re1
rgephy1:  PHY 1 on miibus1
rgephy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto re1: Ethernet address: 00:30:18:ab:d0:1a
re1: [FILTER]

___ mailing list
To unsubscribe, send any mail to ""

Re: fun with if_re

2009-02-04 Thread Gerrit Kühn
On Wed, 4 Feb 2009 19:46:55 +0900 Pyun YongHyeon  wrote
about Re: fun with if_re:

PY> Since you're using RTL8169SC it could be related with my commit
PY> r180519(cvs rev It seems that RTL8169SC does not like
PY> memory mapped register access and I think jkim@ committed patch
PY> for the issue. Would you try re(4) in HEAD?
PY> (Just copying if_re.c, if_rlreg.h and if_rl.c from HEAD to
PY> stable would be enough to build re(4) on stable).

Thanks for the advice.
I did build new nanobsd images with these patches meanwhile and will start
using them today. However, as it has worked without problems for weeks
with the buggy version before, I will not be able to say if it is really
working until next month or so. Or do you know any method to reliably
trigger such errors?

___ mailing list
To unsubscribe, send any mail to ""

Re: fun with if_re

2009-02-05 Thread Gerrit Kühn
On Thu, 5 Feb 2009 17:28:04 +0900 Pyun YongHyeon  wrote
about Re: fun with if_re:

PY> > I did build new nanobsd images with these patches meanwhile and will
PY> > start using them today. However, as it has worked without problems
PY> > for weeks with the buggy version before, I will not be able to say
PY> > if it is really working until next month or so. Or do you know any
PY> > method to reliably

PY> That's fine.

Sorry to be back so soon again, but I just noticed that I did in fact not
produce new images yesterday. :-)
Kernel build stopped with

mkdep -f .depend -a   -nostdinc -D_KERNEL -DKLD_MODULE
t/src/sys/FIREFLY /usr/work/current/src/sys/modules/mii/../../dev/mii/acphy.c 
/usr/work/current/src/sys/modules/mii/../../dev/mii/amphy.c /usr/
ii/e1000phy.c /usr/work/current/src/sys/modules/mii/../../dev/mii/exphy.c 
/usr/work/current/src/sys/modules/mii/../../dev/mii/gentbi.c /usr/wor
miibus_if.c /usr/work/current/src/sys/modules/mii/../../dev/mii/mii.c 
br.c /usr/work/current/src/sys/modules/mii/../../dev/mii/mlphy.c 
/usr/work/current/src/sys/modules/mii/../../dev/mii/nsgphy.c /usr/work/current 
/usr/work/current/src/sys/modules/mii /../../dev/mii/pnaphy.c 
c /usr/work/current/src/sys/modules/mii/../../dev/mii/rlphy.c 
/usr/work/current/src/sys/modules/mii/../. ./dev/mii/truephy.c 
c /usr/work/current/src/sys/modules/mii/../../dev/mii/xmphy.c
In file included
from /usr/work/current/src/sys/modules/mii/../../dev/mii/rgephy.c:60:
@/pci/if_rlreg.h:1509:28: error: token ";" is not valid in preprocessor
expressions @/pci/if_rlreg.h:1917:6: error: unterminated comment
@/pci/if_rlreg.h:1509:1: error: unterminated #if In file included
from /usr/work/current/src/sys/modules/mii/../../dev/mii/rlphy.c:56:
@/pci/if_rlreg.h:1509:28: error: token ";" is not valid in preprocessor
expressions @/pci/if_rlreg.h:1917:6: error: unterminated comment
@/pci/if_rlreg.h:1509:1: error: unterminated #if mkdep: compile failed
*** Error code 1
1 error
*** Error code 2
1 error
*** Error code 2
2 errors
*** Error code 2
1 error
*** Error code 2
1 error

Any hints?

___ mailing list
To unsubscribe, send any mail to ""

Re: fun with if_re

2009-02-05 Thread Gerrit Kühn
On Thu, 5 Feb 2009 12:05:46 +0100 Gerrit Kühn 
wrote about Re: fun with if_re:

GK> Sorry to be back so soon again, but I just noticed that I did in fact
GK> not produce new images yesterday. :-)
GK> Kernel build stopped with


Ignore me, my bad (downloaded the webpage instead of the code via
webcvs :-). On my way now.

___ mailing list
To unsubscribe, send any mail to ""

zfs crashes with nfs and snapshots

2009-02-11 Thread Gerrit Kühn
Hi folks,

I just saw one of my FreeBSD servers (7.0-stable of June 2008) crash while
trying to access the .zfs snapshot directory via a nfs client machine.
The server got a page fault caused by the nfsd process. It wasn't even
able to dump the kernel image anymore.
Resetting the machine it first appeared to come back fine, but shortly
before the login prompt the nfsd let it crash hard again the same way as
before. Then I booted single user, fscked the ufs partitions by hand and
had to re-import the zpool with -f. After that I did another reboot,
whereupon everything was fine again.

As I need that machine I'm a bit unwilling to try accessing the snapshot
directory again via nfs right now. :-)
So here are some questions before I do anything else:
- Did anyone else already see this behaviour?
- Is there something wrong with accessing the snapshot directory via nfs?
- Does zfs stability profit from an update to a recent -stable?

Any answers or further thoughts/hints on this are very welcome.

___ mailing list
To unsubscribe, send any mail to ""

Re: fun with if_re

2009-02-13 Thread Gerrit Kühn
On Thu, 5 Feb 2009 17:28:04 +0900 Pyun YongHyeon  wrote
about Re: fun with if_re:

PY> > I did build new nanobsd images with these patches meanwhile and will
PY> > start using them today. However, as it has worked without problems
PY> > for weeks with the buggy version before, I will not be able to say
PY> > if it is really working until next month or so. Or do you know any
PY> > method to reliably
PY> That's fine.

I had to reboot some of the machines meanwhile and could do some further
testing. One strange thing I noticed is that the re-interfaces often do
not come up in a working state after rebooting. Strangely, I see
network traffic floating around via tcpdump, but not even ping works.
This state often goes away when playing around with the interface
(sometimes ifconfig down/up helps, sometimes disabling some of the
additional features like txc/rxc), but I cannot make out a reproducible
behaviour so far. When the interface leaves this strange state it seems to
work fine afterwards. Any clues?

___ mailing list
To unsubscribe, send any mail to ""

Re: zfs crashes with nfs and snapshots

2009-02-13 Thread Gerrit Kühn
On Wed, 11 Feb 2009 19:55:11 +0200 Jaakko Heinonen 
wrote about Re: zfs crashes with nfs and snapshots:

JH> This is likely the issue described in this message:

Yes, this looks very much like it.

JH> The nfs fix has been committed to head and stable/7 (7.1-RELEASE has
JH> the fix). The fix prevents system from panicing but you still can't
JH> access the snapshot directory with readdirplus enabled nfs clients. As
JH> a workaround you can disable readdirplus support if your nfs client
JH> allows it.

Ok, I will upgrade to 7.1-stable asap. The client was Linux 2.6.25, I
cannot say if it uses readdirplus and if I could disable that (the manpage
says nothing about it at all, but I will look into that further).
Thanks for the hint.

___ mailing list
To unsubscribe, send any mail to ""

Re: fun with if_re

2009-02-13 Thread Gerrit Kühn
On Fri, 13 Feb 2009 19:24:00 +0900 Pyun YongHyeon  wrote
about Re: fun with if_re:

PY> > I had to reboot some of the machines meanwhile and could do some
PY> > further testing. One strange thing I noticed is that the
PY> > re-interfaces often do not come up in a working state after
PY> > rebooting. Strangely, I see network traffic floating around via
PY> > tcpdump, but not even ping works. This state often goes away when
PY> > playing around with the interface (sometimes ifconfig down/up helps,
PY> > sometimes disabling some of the additional features like txc/rxc),
PY> > but I cannot make out a reproducible behaviour so far. When the
PY> > interface leaves this strange state it seems to work fine
PY> > afterwards. Any clues?

PY> Does this happen on latest if_re.c/if_rlreg.h? I guess jkim fixed
PY> this type of problem in r187483. If that have no effect please let
PY> me know.

It happens on both versions: the old one from 11th Dec 08 I still had, and
the new one I built with the patches you recommended about a week ago.
if_re is 1.151 2009/01/20 20:22:28 jkim, if_rlreg is 1.94 2009/01/20
20:22:28 jkim for the latter.

___ mailing list
To unsubscribe, send any mail to ""

Re: fun with if_re

2009-02-13 Thread Gerrit Kühn
On Fri, 13 Feb 2009 20:39:55 +0900 Pyun YongHyeon  wrote
about Re: fun with if_re:

PY> Ok, try attached patch.

Thanks, building new images right now. I'll be back later (next week).

___ mailing list
To unsubscribe, send any mail to ""

Re: zfs crashes with nfs and snapshots

2009-02-17 Thread Gerrit Kühn
On Mon, 16 Feb 2009 19:43:00 +0200 Jaakko Heinonen 
wrote about Re: zfs crashes with nfs and snapshots:

JH> > Ok, I will upgrade to 7.1-stable asap. The client was Linux 2.6.25,
JH> > I cannot say if it uses readdirplus and if I could disable that (the
JH> > manpage says nothing about it at all, but I will look into that
JH> > further).

JH> -o nordirplus mount option should disable it on Linux.

Thanks. I missed that when first looking into the manpage (probably because
it's written in UPPERCASE :-).

___ mailing list
To unsubscribe, send any mail to ""

Re: Support for SAS/SATA non-RAID adapters

2009-11-18 Thread Gerrit Kühn
On Tue, 17 Nov 2009 16:29:06 -0800 Freddie Cash  wrote
about Support for SAS/SATA non-RAID adapters:

FC> Any recommendations on other SAS/SATA controllers to look at (just not
FC> anything with MegaRAID in the name)?

I installed a Supermicro AOC-USASLP-L8i card here some days ago. Should be
even cheaper than the ones you mentioned and comes with a LSI chip
supported by mpt driver:

m...@pci0:6:0:0:class=0x01 card=0xa68015d9 chip=0x00581000
rev=0x08 hdr=0x00 vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
device = 'SAS 3000 series, 8-port with 1068E -StorPort'
class  = mass storage
subclass   = SCSI

I only installed it last week and cannot comment much on performance and
stability up to now.

___ mailing list
To unsubscribe, send any mail to ""

Re: Support for SAS/SATA non-RAID adapters

2009-11-18 Thread Gerrit Kühn
On Wed, 18 Nov 2009 08:56:14 -0800 Freddie Cash  wrote
about Re: Support for SAS/SATA non-RAID adapters:

FC> > I installed a Supermicro AOC-USASLP-L8i card here some days ago.
FC> > Should be even cheaper than the ones you mentioned and comes with a
FC> > LSI chip supported by mpt driver:

FC> > m...@pci0:6:0:0:        class=0x01 card=0xa68015d9
FC> > chip=0x00581000 rev=0x08 hdr=0x00 vendor     = 'LSI Logic (Was:
FC> > Symbios Logic, NCR)' device     = 'SAS 3000 series, 8-port with
FC> > 1068E -StorPort' class      = mass storage
FC> >    subclass   = SCSI

FC> > I only installed it last week and cannot comment much on performance
FC> > and stability up to now.

FC> These look nice, and are in the $200-300 CDN range.  Have the same
FC> mini-SAS connectors as the 3Ware cards we use, so wouldn't have to
FC> re-cable the chassis.

Hm, I don't know the recent exchange rate, but are you sure this is the
same card? I paid something like 80,-€ (excl. VAT).

FC> Are you using these as standard disk controllers, or are you using the
FC> RAID features (seems it supports RAID0 and RAID1 in hardware, RAID5 in
FC> software)?  Reading through the manual right now, and it doesn't cover
FC> using the card in non-RAID modes.  Wondering if the drives would show
FC> up as normal da0 da1 da2 etc.

I think my card does not have the raid features included, maybe that's why
it was so cheap. The devices appear as normal scsi disks:

da0 at mpt0 bus 0 target 0 lun 0
da0:  Fixed Direct Access SCSI-5 device
da0: 300.000MB/s transfers
da0: Command Queueing enabled
da0: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)

cliff# camcontrol devlist
at scbus0 target 0 lun 0 (da0,pass0)
at scbus0 target 1 lun 0 (da1,pass1)
at scbus0 target 2 lun 0 (da2,pass2)
at scbus0 target 3 lun 0 (da3,pass3)
at scbus0 target 4 lun 0 (da4,pass4)
at scbus0 target 5 lun 0 (da5,pass5)
at scbus0 target 6 lun 0 (da6,pass6)
at scbus0 target 7 lun 0 (da7,pass7)

FC> All of these (there's a couple variations on the card) appear to be
FC> PCIe, though, no PCI-X.  We have 24 drive bays, and only 2 PCIe slots.
FC> Have 3 PCI-X slots, though, so would need at least 1 PCI-X
FC> controller.

I guess the version of the card I have here was actually intended to be
used in some kind of special Supermirco-Extension Slot. However, it fits
into a standard PCIe slot and works nicely there as far as I can tell.
Do you have the opportunity of using a riser card that would give you one
more slot?

___ mailing list
To unsubscribe, send any mail to ""

Re: Support for SAS/SATA non-RAID adapters

2009-11-19 Thread Gerrit Kühn
On Wed, 18 Nov 2009 11:37:03 -0600 Barry Pederson  wrote
about Re: Support for SAS/SATA non-RAID adapters:

> > I guess the version of the card I have here was actually intended to
> > be used in some kind of special Supermirco-Extension Slot. However,
> > it fits into a standard PCIe slot and works nicely there as far as I
> > can tell. Do you have the opportunity of using a riser card that
> > would give you one more slot?

BP> Those Supermicro UIO cards look like backwards PCIe cards.  Do they
BP> come with other brackets for fitting into a PCIe slot, or did you have
BP> to go bracketless?

They only come with a bracket that does not exactly fit into a standard
slot. Maybe the other bracket is available, but I did not care much about
it and simply went for bracketless (not much of a problem with a low
profile card).

BP> didn't mention anything about brackets or how it'd work in PCIe slots.

For me it simply works. Only the bracket does not fit.

___ mailing list
To unsubscribe, send any mail to ""

Re: Support for SAS/SATA non-RAID adapters

2009-11-19 Thread Gerrit Kühn
On Wed, 18 Nov 2009 09:35:56 -0800 Freddie Cash  wrote
about Re: Support for SAS/SATA non-RAID adapters:

FC> > Hm, I don't know the recent exchange rate, but are you sure this is
FC> > the same card? I paid something like 80,-€ (excl. VAT).

FC> Oops, you're right, was reading the model numbers wrong.  The
FC> LSI1068-based one is only $129 CDN, the Intel IOP-based ones are
FC> $200-300 CDN.

That makes sense then.

FC> Last time I checked the Euro was in the $1.50-2.00 CDN range.

Seems to be something like 1.55 these days.

FC> > I guess the version of the card I have here was actually intended to
FC> > be used in some kind of special Supermirco-Extension Slot. However,
FC> > it fits into a standard PCIe slot and works nicely there as far as I
FC> > can tell. Do you have the opportunity of using a riser card that
FC> > would give you one more slot?

FC> Urgh, I have yet to find a riser card that will plug into a Tyan
FC> motherboard and not cause issues.  Due to all the issues we've had
FC> with riser cards in the past, we have sworn off all riser cards.  For
FC> our 2U servers, we use low-profile cards to avoid risers.

I had some trouble with risers in the past, too. However, I have a Tyan
Transport here that seems to work nicely at least with the riser that came
with the system.

FC> I'll keep looking for a PCI-X card.  These look like they'll cover our
FC> PCIe needs.

Please let us know if you find one that is suitable. I spent quite some
time to dig out the Supermicro card; cheap (without raid) and
FreeBSD-supported cards with more than 4 channels are not that common.

___ mailing list
To unsubscribe, send any mail to ""

Re: Support for SAS/SATA non-RAID adapters

2009-11-19 Thread Gerrit Kühn
On Wed, 18 Nov 2009 13:15:59 -0600 Barry Pederson  wrote
about Re: Support for SAS/SATA non-RAID adapters:

BP> What I was questioning was where the OP said: "it fits into a standard 
BP> PCIe slot and works nicely there as far as I can tell" - which to me 
BP> sounds like you could use this HBA in a *NON-Supermicro* motherboard.

BP> I was just wondering if that was truly the case, given how in the
BP> photos it looks to be arranged physically backwards from a regular
BP> PCIe card, and given how you mention "The UIO slot itself is
BP> proprietary".

I'm sorry if my comment "fits into a standard PCIe slot" was misleading
here. I wanted to state that -although Supermirco lists this one as a card
for UIO- I plugged it into a standard PCIe slot and it simply works there
for me. Just the mounting bracket it came with did not fit, but for a low
profile card it is not that difficult to live without it.

BP> But some more digging on Google has turned up a few mentions along the 
BP> lines of:
BP> """
BP>This card plugs into a normal PCIe 8x slot but the
BP>metal mounting bracket bolted to the card is made
BP>for a UIO slot (which is why it's so cheap).
BP>All you have to do is remove the metal bracket and
BP>zip-tie the card to your case for mechanical support.
BP>Electrically it'll work fine in a PCIe x8 or x16 slot.
BP> """

That's exactly my experience.

BP> If someone wanted to make PCIe compatible brackets for this affordable 
BP> card, they'd probably sell a fair number to small shops or home users.

Yeah, I would also buy some. :-)

___ mailing list
To unsubscribe, send any mail to ""

zfs/nfs mkstemp() failure & subsequent hangs

2009-11-20 Thread Gerrit Kühn
Hi all,

I have a 8.0-PRERELEASE zfs/nfs server here that complains about i/o
errors when using rsync on a nfs client:

rsync: mkstemp
"/usr/portage/metadata/cache/app-mobilephone/.ksms-" failed:
Input/output error (5)

I found this to be quite similar to kern/135412. However, this one
is said to be fixed and only applicable to 7-stable anyway.
Furthermore, after this happened, I tried to access files on the server
from the zfs filesystem concerned and found that I cannot access the fs
anymore. ls hangs in state zfs, so do mountd and zfs unmount.

Should I open a new PR for this?
Are there any ideas how to recover access to the fs apart from rebooting
the machine? Right now I still have it running, so I could get some more
debugging information out of it.

___ mailing list
To unsubscribe, send any mail to ""

Re: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-19 Thread Gerrit Kühn
On Mon, 18 Jan 2010 21:41:53 -0500 Garrett Moore 
wrote about Re: immense delayed write to file system (ZFS and UFS2),
performance issues:

GM> The drives being discussed in my related thread (regarding poor
GM> performance) are all WD Green drives. I have used wdidle3 to set all
GM> of my drive timeouts to 5 minutes. I'll see what sort of difference
GM> this makes for performance.

GM> Even if it makes no difference to performance, thank you for pointing
GM> it out
GM> -- my drives have less than 2,000 hours on them and were all over
GM> 90,000 load cycles due to this moronic factory setting. Since changing
GM> the timeout, they haven't parked (which is what I would expect).

Thanks for bringing up this topic here. I have drives showing up close to
80 load cycle counts here. Guess it's time for that fix... :-|

___ mailing list
To unsubscribe, send any mail to ""

Re: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-19 Thread Gerrit Kühn
On Tue, 19 Jan 2010 01:57:36 -0800 Jeremy Chadwick
 wrote about Re: immense delayed write to file
system (ZFS and UFS2), performance issues:

JC> If you want a consumer-edition drive that's better tuned for server
JC> work, you should really be looking at the WD Caviar Black series or
JC> their RE/RE2 series.  

That's exactly what I did. I have WD-RE2 drives here that show exactly
this problem (RE2/GP)! The model number is WD1000FYPS-01ZKB0.

___ mailing list
To unsubscribe, send any mail to ""

Re: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-19 Thread Gerrit Kühn
On Tue, 19 Jan 2010 03:24:49 -0800 Jeremy Chadwick
 wrote about Re: immense delayed write to file
system (ZFS and UFS2), performance issues:

JC> > JC> If you want a consumer-edition drive that's better tuned for
JC> > JC> server work, you should really be looking at the WD Caviar Black
JC> > JC> series or their RE/RE2 series.  

JC> > That's exactly what I did. I have WD-RE2 drives here that show
JC> > exactly this problem (RE2/GP)! The model number is WD1000FYPS-01ZKB0.

JC> I should have been more specific.  WD makes RE-series drives which
JC> don't have GP applied to them; those are what I was referring to.

Well, when I bought these drives I was not aware of this issue. Buying a
drive intended for 24/7 use in RAID configurations is basically the right
idea, I think. From what was written about the GP feature back then I
could not anticipate such problems.
I would have liked to buy the 2TB drives without GP lately, but they have
lead times into April here. So I went for the GP model, which now shows
the same problem as the 1TB drive... :-(

JC> WD1000FYPS - WD RE2-GP,   1TB, 16MB, variable rpm
JC> WD2002FYPS - WD RE4-GP,   2TB, 64MB, variable rpm

JC> So which drive models above are experiencing a continual increase in
JC> SMART attribute 193 (Load Cycle Count)?  My guess is that some of the
JC> WD Caviar Green models, and possibly all of the RE2-GP and RE4-GP
JC> models are experiencing this problem.

I can confirm that the two models above show this problem.
Furthermore I can confirm that at least in my setup here this drive
type works fine:


I have some of the RE3 drives sitting around here and will probably try
them later.
Can anyone here report anything about the fixed firmware from
Does this remedy the problem for the 1TB RE2 drive?

JC> I say "some" with regards to WD Caviar Green since I have some which do
JC> not appear to exhibit the heads/actuator arm moved into the
JC> landing/park zone.  I'm at work right now, but when I get home I can
JC> verify what models I've used which didn't experience this problem, as
JC> well as what the manufacturing date and F/W revisions are.  I should
JC> note I don't have said Green drives in use (I use WD1001FALS drives
JC> now).

Thanks for sharing this information.

___ mailing list
To unsubscribe, send any mail to ""

Re: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-26 Thread Gerrit Kühn
On Tue, 19 Jan 2010 03:24:49 -0800 Jeremy Chadwick
 wrote about Re: immense delayed write to file
system (ZFS and UFS2), performance issues:

JC> So which drive models above are experiencing a continual increase in
JC> SMART attribute 193 (Load Cycle Count)?  My guess is that some of the
JC> WD Caviar Green models, and possibly all of the RE2-GP and RE4-GP
JC> models are experiencing this problem.

Just to add some more info:
I contacted WD support about the problem with RE4 drives and received a
firmware update by email today which is supposed to fix the problem. Did
not try it yet, though.

I am still busy replacing RE2-disks with updated drives. I came across a
very strange thing with zfs. Actually I had the following pool layout:

mclane# zpool status
  pool: tank
 state: ONLINE
 scrub: none requested

tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
ad8 ONLINE   0 0 0
ad10ONLINE   0 0 0
ad12ONLINE   0 0 0
  ad14  AVAIL   

errors: No known data errors

All disks still have the firmware bug, so I want to replace them with
disks that I already fixed. I put in a updated drive as ad18 and
wanted to replace ad12 to get the drive with the broken firmware out:

mclane# zpool replace tank /dev/ad12 /dev/ad18 
mclane# zpool status
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.01% done, 52h51m to go

tank   ONLINE   0 0 0
  raidz1   ONLINE   0 0 0
ad8ONLINE   0 0 0  7.21M resilvered
ad10   ONLINE   0 0 0  7.22M resilvered
replacing  ONLINE   0 0 0
  ad12 ONLINE   0 0 0
  ad18 ONLINE   0 0 0  10.7M resilvered
  ad14 AVAIL   

errors: No known data errors

However, something must have gone wrong during the resilvering process and
it now looks like this:

mclane# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are
unaffected. action: Determine if the device needs to be replaced, and
clear the errors using 'zpool clear' or replace the device with 'zpool
replace'. see:
 scrub: resilver completed after 2h39m with 0 errors on Tue Jan 26
14:00:00 2010 config:

tank   DEGRADED 0 0 0
  raidz1   DEGRADED 0 0 0
ad8ONLINE   0 0 0  975M resilvered
ad10   ONLINE   0 0   142  974M resilvered
replacing  DEGRADED 0 7.25M 0
  ad12 ONLINE   0 0 0
  ad18 REMOVED  0 1 0  79.4M resilvered
  ad14 AVAIL   

errors: No known data errors

What is going on here? ad18 obviously detached during the
process. /var/log/messages just gives me

Jan 26 11:23:33 mclane kernel: ad18: FAILURE - device detached

Additionally ad10 obviously produced chksum errors. What do I do about the
degraded replacing process? Can I terminate it somehow and maybe replace
ad10 first? Any other hints?

___ mailing list
To unsubscribe, send any mail to ""

Re: ZFS "zpool replace" problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 06:30:21 -0800 Jeremy Chadwick
 wrote about Re: ZFS "zpool replace" problems:

JC> I'm removing the In-Reply-To mail headers for this thread, as you've
JC> now hijacked it for a different purpose.  Please don't do this; start
JC> a new thread altogether.  :-)

Thanks. You're perfectly right, I should have done that.

JC> I'm not sure how the above is supposed to work (I haven't personally
JC> tried it), but:
JC> 1) Why didn't you offline the ad10 disk first?
JC>zpool offline tank ad10

Well, probably because I thought that zfs would simply handle the
situation. I just wanted to replace drive A with drive B, so this was
quite straight-forward for me.

JC> 2) How did you attach ad18?  Did you tell the system about it using
JC>atacontrol?  If so, what commands did you use?

Yes. The drives did not appear automatically (verified with atacontrol
list). Then I first tried reinit ata9, but that did not work out, so I did
a detach/attach for ata9, then the drive was there (with list and also
the device node appeared).

JC> 3) Can you please provide uname -a output, as well as relevant dmesg
JC>output to show what kind of SATA controller you have, what's
JC>attached to what, etc.?

Of course (dmesg is not there anymore, I use pciconf -vl and
atacontrol instead):

ATA channel 0:
Master:  no device present
Slave:  acd0  ATA/ATAPI revision 0
ATA channel 1:
Master:  no device present
Slave:   no device present
ATA channel 2:
Master:  ad4  SATA revision 2.x
Slave:   no device present
ATA channel 3:
Master:  ad6  SATA revision 2.x
Slave:   no device present
ATA channel 4:
Master:  ad8  SATA revision 2.x
Slave:   no device present
ATA channel 5:
Master: ad10  SATA revision 2.x
Slave:   no device present
ATA channel 6:
Master: ad12  SATA revision 2.x
Slave:   no device present
ATA channel 7:
Master: ad14  SATA revision 2.x
Slave:   no device present
ATA channel 8:
Master:  no device present
Slave:   no device present
ATA channel 9:
Master:  no device present
Slave:   no device present

FreeBSD 7.2-STABLE FreeBSD 7.2-STABLE #0:
Mon Sep  7 11:01:56 CEST 2009  amd64

The first six drives (up to ad14) are connected onboard (Supermicro dual
opteron board with mcp55):

atap...@pci0:0:5:0: class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp'
device = 'MCP55 SATA/RAID Controller (MCP55S)'
class  = mass storage
subclass   = RAID
atap...@pci0:0:5:1: class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp'
device = 'MCP55 SATA/RAID Controller (MCP55S)'
class  = mass storage
subclass   = RAID
atap...@pci0:0:5:2: class=0x010485 card=0x161115d9 chip=0x037f10de
rev=0xa3 hdr=0x00 vendor = 'Nvidia Corp'
device = 'MCP55 SATA/RAID Controller (MCP55S)'
class  = mass storage
subclass   = RAID

The other two (ad16 and ad18, the chassis has 8 slots and the last two
were only intended to be used in situtations like the one I have now) are
connected to an extra pci card:

atap...@pci0:3:6:0: class=0x010401 card=0x02409005 chip=0x02401095
rev=0x02 hdr=0x00 vendor = 'Silicon Image Inc (Was: CMD Technology
Inc)' device = 'SATA/Raid controller(2XSATA150) (SIL3112)'
class  = mass storage
subclass   = RAID

Meanwhile I took out the ad18 drive again and tried to use a different
drive. But that was listed as "UNAVAIL" with corrupted data by zfs.
Probably it already branded the disk for resilvering and is looking for
exactly this one now. I also put in the disk which caused the problem
above again. The resilvering process started again, but very soon the
drive got detached again resulting in the same situation I described above.

Any help is greatly appreciated.

___ mailing list
To unsubscribe, send any mail to ""

Re: ZFS "zpool replace" problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 08:15:27 -0800 Chuck Swiger  wrote
about Re: ZFS "zpool replace" problems:

CS> > Meanwhile I took out the ad18 drive again and tried to use a
CS> > different drive. But that was listed as "UNAVAIL" with corrupted
CS> > data by zfs.

CS> There's your problem-- the Silicon Image 3112/4 chips are remarkably
CS> buggy and exhibit data corruption:

Hm, sure? I would expect the same behaviour (detaching) as with the first
drive if the controller was the reason in this case.


I already thought about replacing the controller to get rid of the
detach-problem. However, I cannot do this online and I really would prefer
fixing the disk firmware problem first.
I could remove the hotspare drive ad14 and use this slot for putting in a
replacement disk. Is it possible to get ad18 out of zfs' replacing
process? Maybe by detaching the disk from the pool?

___ mailing list
To unsubscribe, send any mail to ""

Re: ZFS "zpool replace" problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 08:27:37 -0800 Jeremy Chadwick
 wrote about Re: ZFS "zpool replace" problems:

JC> Well, to be fair, we can't be 100% certain he got bit by that bug.
JC> It's possible/likely, but we don't know for certain at this point.  We
JC> also don't know what brand hard disks he had connected to ad16 and/or
JC> ad18.

The same as on the others (WD RE2GP), just with the updated firmware
(02.01B02 that is) to get rid of the lcc problem.

JC> Older Silicon Image controllers are known for. well, just read the
JC> Wikipedia entry for details.

I knew the card is not top of the line, but I didn't know that it
is /that/ bad. When I set up the system 1 or 2 years ago, I just thought
it might be nice to be able to use the two extra slots in case of any
drives having to be replaced or so and the card was just lying aroung
(well, maybe I have an idea now why nobody else wanted to use it :-).

I guess I will try to offline the hotspare slot (connected to the mcp55 on
the motherboard) and plug the replacement disk in there. Maybe zfs
recognizes it and picks up the resilvering there. Otherwise I'll have to
look into how to get rid of the degraded resilvering process and restart it
with the drive in the other slot.

JC> As others have stated already: Intel could make a fortune off of a
JC> simple PCIe or PCI-X SATA controller card that's ICH9/ICH10-based.

Indeed. I use these 8-channel Supermicro-Controller (I think I recommended
them some time ago here) with LSI chipset that work really nicely. But
the backet does not fit into standard slots and there is no PCI-X version.
I would certainly prefer a regular card by Intel.

___ mailing list
To unsubscribe, send any mail to ""

Re: ZFS "zpool replace" problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 08:46:19 -0800 Jeremy Chadwick
 wrote about Re: ZFS "zpool replace" problems:

JC> - zpool offline  
JC> - atacontrol detach ataX (where X = channel associated with disk)
JC> - Physically remove bad disk
JC> - Physically insert new disk
JC> - Wait 15 seconds for stuff to settle
JC> - atacontrol attach ataX (where X = previous channel detached)
JC> - zpool replace  
JC> - zpool online  

JC> "reinit" shouldn't be needed at all -- in fact, I've seen reinit cause
JC> some craziness (even on Intel controllers), including a system
JC> deadlock, but this was back during the RELENG_6 and RELENG_7 days.
JC> Great improvements have been made to ata(4) since then.

Thanks for pointing that out. I would have went exactly this way, if I did
not have the extra slots or one of the drives was actually faulty. But in
this case I just wanted to replace every drive on-by-one and (at least I
thought) I had extra slots, so I did not want to give up the redundancy
during the replacement (knowing very well that the drives to be replaced
are already beyond the specification of wd due to the load-cycle bug).

JC> If you need me to validate the above procedure (it's been a while since
JC> I've had to hot-swap a disk), I can do so.  I do have a 4-disk
JC> Supermicro SuperServer 5015B-MTB (ICH9-based) sitting on my workbench
JC> which I can test with.

I'm quite sure this will work fine. I just don't know how to get rid of
the degraded replacement zfs sees.

JC> It honestly sounds like hot-swapping is causing some chaos on your
JC> system.  Are all of the controllers involved configured for AHCI?  

I think so. How could I verify this?

___ mailing list
To unsubscribe, send any mail to ""

Re: ZFS "zpool replace" problems

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 08:59:27 -0800 Chuck Swiger  wrote
about Re: ZFS "zpool replace" problems:

CS> As a general matter of maintaining RAID systems, however, the approach
CS> to upgrading drive firmware on members of a RAID array should be to
CS> take down the entire container and offline the drives, update one
CS> drive, test it (via SMART self-test and read-only checksum comparison
CS> or similar), and then proceed to update all of the drives (preferably
CS> doing the SMART self-test for each, if time allows) before returning
CS> them to the RAID container and onlining them.

Well, I had several spare drives sitting on the shelf. So I updated the
firmware of these spare drives and now want to replace the drives with the
old firmware by new new ones one-by-one. Taking the system offline for
longer than a few minutes is not really an option. I'd rather roll in a
new machine to take over the job in that case.

CS> Pulling individual drives from a RAID set while live and updating the
CS> firmware one at a time is not an approach I would take-- running with
CS> mixed firmware versions doesn't thrill me, and I know of multiple
CS> cases where someone made a mistake reconnecting a drive with the wrong
CS> SCSI id or something like that, taking out a second drive while the
CS> RAID was not redundant, resulting in massive data corruption or even
CS> total loss of the RAID contents.

This scenario was exactly the reason why I plugged in the new drive to an
extra slot and asked zfs to replace it with an old one. Well, I did not
know what kind of fiasco the controller for this extra slot would turn out
to be - otherwise I would have used the hot-spare slot for this in the
first place.

___ mailing list
To unsubscribe, send any mail to ""

Re: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-26 Thread Gerrit Kühn
On Wed, 27 Jan 2010 03:53:20 +0900 Tommi Lätti  wrote about
Re: immense delayed write to file system (ZFS and UFS2), performance

TL> Well AFAIK WD certifies that there's no extra risk involved unless you
TL> go over 300.000 park cycles. On the other hand, my 9 month 1.5tb green
TL> drive has over 200.000 cycles.

I think the RE2 drives I have here are certified for 600k cycles.

TL> Maybe check if you can disable the idle timer using WDIDLE3... works
TL> for my drives (although it did some strange things to one out of the 6
TL> drives --> decreased reported sector count and the zfs invalidated the
TL> pool :/ ).

I can only encourage everyone having this problem to report to WD's
support about this. Today I received an update for the firmware of
RE4-drives (which I did not try out yet). IMHO, the more people complain
about these issues, the higher is the chance that WD will do something
about it.

___ mailing list
To unsubscribe, send any mail to ""

Re: immense delayed write to file system (ZFS and UFS2), performance issues

2010-01-26 Thread Gerrit Kühn
On Tue, 26 Jan 2010 19:12:01 -0500 Damian Gerow 
wrote about Re: immense delayed write to file system (ZFS and UFS2),
performance issues:

DG> Adrian Wontroba wrote:

DG> Having a script kick off and write to a disk will help so long as that
DG> disk is writable; if it's being used as a hot spare in a raidz array,
DG> it's not going to help much.

For my RE2 and RE4 disks I wrote a script that calls smartctl -a on all
disks (one after another) every 5s or so. This also prevents the counter
to increase in my setup and you can do it for every disk, no matter if
they are in a raid compound or not. I think writing to the disks may also
fail the desired effect if you have stripes the writes are spead to (raid
50 or similar zpool setups).

Just my 2¢.

___ mailing list
To unsubscribe, send any mail to ""

one more load-cycle-count problem

2010-02-07 Thread Gerrit Kühn
Hi all,

After being disturbed by the firmware issues of the wd drives causing
exceeding load cycles (see thread "immense delayed write to file system
(ZFS and UFS2), performance issues" in January), I have found some more
problematic drives in the following setup:

4 x 2.5" WDC WD4000BEVT-00ZAT0 in RAIDZ1 configuration attached to a
Supermicro SAS controller:

m...@pci0:2:0:0:class=0x01 card=0xa38015d9 chip=0x00581000
rev=0x08 hdr=0x00 vendor = 'LSI Logic (Was: Symbios Logic, NCR)'
device = 'SAS 3000 series, 8-port with 1068E -StorPort'
class  = mass storage
subclass   = SCSI

luna# camcontrol devlist
at scbus0 target 0 lun 0 (pass0,da0)
at scbus0 target 1 lun 0 (pass1,da1)
at scbus0 target 2 lun 0 (pass2,da2)
at scbus0 target 3 lun 0 (pass3,da3)

The disks appear to load/unload every 10s or so if I do not artificially
keep them busy. Does anyone here have a suggestion how to make this
interval longer or even turn off the unload feature completely?

I tried

luna# camcontrol idle da0 -t 600
(pass0:mpt0:0:0:0): CMD: IDLE: e3 00 00 00 00 40 00 00 00 00 78 00
(pass0:mpt0:0:0:0): CAM Status: CCB request was invalid

luna# camcontrol standby da0 -t 600
(pass0:mpt0:0:0:0): CMD: STANDBY: e2 00 00 00 00 40 00 00 00 00 78 00
(pass0:mpt0:0:0:0): CAM Status: CCB request was invalid

>From /usr/share/misc/scsi_modes I gather that page 26 should contain power
control features, but no avail:

luna# camcontrol modepage da0 -m 26
camcontrol: error sending mode sense command

Any further ideas how to get rid of this "feature"?

___ mailing list
To unsubscribe, send any mail to ""

Re: one more load-cycle-count problem

2010-02-08 Thread Gerrit Kühn
On Mon, 8 Feb 2010 15:43:46 +0200 Dan Naumov  wrote
about RE: one more load-cycle-count problem:

DN> >Any further ideas how to get rid of this "feature"?

DN> 1) The most "clean" solution is probably using the WDIDLE3 utility on
DN> your drives to disable automatic parking or in cases where its not
DN> possible to complete disable it, you can adjust it to 5 minutes, which
DN> essentially solves the problem. Note that going this route will
DN> probably involve rebuilding your entire array from scratch, because
DN> applying WDIDLE3 to the disk is likely to very slightly affect disk
DN> geometry, but just enough for hardware raid or ZFS or whatever to bark
DN> at you and refuse to continue using the drive in an existing pool (the
DN> affected disk can become very slightly smaller in capacity). Backup
DN> data, apply WDIDLE3 to all disks. Recreate the pool, restore backups.
DN> This will also void your warranty if used on the new WD drives,
DN> although it will still work just fine.

Thanks for the warning. How on earth can a tool to set the idle time
affect the disk geometry?!

DN> 2) A less clean solution would be to setup a script that polls the
DN> SMART data of all disks affected by the problem every 8-9 seconds and
DN> have this script launch on boot. This will keep the affected drives
DN> just busy enough to not park their heads.

That's what I'm doing since yesterday when I first noted the problem on
this particular system. Not a pretty solution either. I'm close of buying
Hitachi drives instead (HTE545050B9A300). Does anyone here know these
drives and can confirm that they do not have this kind of problem (I
would expect it because of the 24/7 certification)?

___ mailing list
To unsubscribe, send any mail to ""

Re: one more load-cycle-count problem

2010-02-08 Thread Gerrit Kühn
On Mon, 8 Feb 2010 06:22:59 -0800 Jeremy Chadwick
 wrote about Re: one more load-cycle-count

JC> The DOS utilities submit custom ATA CMDs or data to all WD disks to
JC> toggle or adjust these features.  If someone could figure out what the
JC> command(s) were, the feature(s) could be implemented into atacontrol
JC> (8). Of course, that would require reverse-engineering of the EXEs,
JC> which would probably induce DMCA-related lawsuits (in the US).  Sad
JC> too, since documentation of said feature(s) would improve customer
JC> satisfaction. But hey, I'm just an engineer, what do I know.

I would really prefer to be able to set this stuff via camcontrol or
atacontrol. Alone having to boot DOS with this machine (no floppy, no
cdrom) will be a real pain. And most probably the DOS tool will not be
able to see the disks sitting behind my lsi-driven controller anyway, so I
have to plug them elsewhere, too. Great job, WD. :-(

___ mailing list
To unsubscribe, send any mail to ""

Re: hardware for home use large storage

2010-02-09 Thread Gerrit Kühn

CS> pricey hardware raid cards for compatibility reasons.  There seem to
CS> be no decent add-on SATA cards that play nice with FreeBSD other than
CS> that weird supermicro card that has to be physically hacked about to
CS> fit.

BTW: I recently built some more machines with this card. I can confirm now
that you can use it with "standard" brackets, if you have some spare. The
distance for the two holders is the same as for e.g. 3ware 95/96
controllers and I had some spares in standard height there because I use
the 3wares in low profile setups. The brackets of Intel NICs seem to fit,
too. The only thing that is different with the card now is the side on
which the components are mounted. But this should not be a problem unless
you want to place them next ti a graphics card.

___ mailing list
To unsubscribe, send any mail to ""

Re: hardware for home use large storage

2010-02-09 Thread Gerrit Kühn
On Tue, 09 Feb 2010 17:21:32 +1100 Andrew Snow  wrote
about Re: hardware for home use large storage:


The good thing about this board is that the pineview atoms seem to be
64bit capable, which makes them attractive for zfs. I bought a board with
VIA Nano processor for this reason last year, as I could not find a decent
hardware with 64bit capable atom.

___ mailing list
To unsubscribe, send any mail to ""

zpool vdev vs. glabel

2010-02-09 Thread Gerrit Kühn

I have created a raidz2 with disk I labeled with glabel before. Right
after creation this pool looked fine, using devices label/tank[1-6].

I did some tests with replacing/swapping disks and so on. After doing a

zpool offline tank label/tank6
remove disk
camcontrol rescan all
insert disk
camcontrol rescan all
zpool online tank label/tank6

I got the disk back, but not under the requested label, but under the da
device name:

  pool: tank
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Tue Feb  9 14:56:37
2010 config:

tank ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
label/tank1  ONLINE   0 0 0  8.50K resilvered
label/tank2  ONLINE   0 0 0  7.50K resilvered
label/tank3  ONLINE   0 0 0  8.50K resilvered
label/tank4  ONLINE   0 0 0  7.50K resilvered
label/tank5  ONLINE   0 0 0  9K resilvered
da6  ONLINE   0 0 0  13.5K resilvered

errors: No known data errors

Why does this happen? Is there any way to get zfs to use the label again?
After the device is in use, the label in /dev/label disappears. When
taking the device offline again, the label is there, but cannot be used:

pigpen# zpool offline tank da6
pigpen# zpool status
  pool: system
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are
unaffected. action: Determine if the device needs to be replaced, and
clear the errors using 'zpool clear' or replace the device with 'zpool
replace'. see:
 scrub: resilver completed after 0h0m with 0 errors on Tue Feb  9 14:49:14
2010 config:

system ONLINE   0 0 0
  mirror   ONLINE   0 0 0
label/system1  ONLINE   3   617 0  126K resilvered
label/system2  ONLINE   0 0 0  41K resilvered

errors: No known data errors

  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are
unaffected. action: Determine if the device needs to be replaced, and
clear the errors using 'zpool clear' or replace the device with 'zpool
replace'. see:
 scrub: resilver completed after 0h0m with 0 errors on Tue Feb  9 14:56:37
2010 config:

tank DEGRADED 0 0 0
  raidz2 DEGRADED 0 0 0
label/tank1  ONLINE   0 0 0  8.50K resilvered
label/tank2  ONLINE   0 0 0  7.50K resilvered
label/tank3  ONLINE   0 0 0  8.50K resilvered
label/tank4  ONLINE   0 0 0  7.50K resilvered
label/tank5  ONLINE   0 0 0  9K resilvered
da6  OFFLINE  038 0  13.5K resilvered

errors: No known data errors
pigpen# ll /dev/label/
total 0
crw-r-  1 root  operator0, 104 Feb  9 14:04 lisacrypt1
crw-r-  1 root  operator0, 112 Feb  9 14:04 lisacrypt2
crw-r-  1 root  operator0, 113 Feb  9 14:04 lisacrypt3
crw-r-  1 root  operator0, 134 Feb  9 14:48 system1
crw-r-  1 root  operator0, 115 Feb  9 14:04 system2
crw-r-  1 root  operator0, 116 Feb  9 14:04 tank1
crw-r-  1 root  operator0, 117 Feb  9 14:04 tank2
crw-r-  1 root  operator0, 118 Feb  9 14:04 tank3
crw-r-  1 root  operator0, 101 Feb  9 14:04 tank4
crw-r-  1 root  operator0, 102 Feb  9 14:04 tank5
crw-r-  1 root  operator0, 103 Feb  9 15:02 tank6

pigpen# zpool online tank label/tank6
cannot online label/tank6: no such device in pool

In a different thread I found the hint to use zpool replace to get to the
usage of labels, but this seems not possible, either:

pigpen# zpool replace tank label/tank6
invalid vdev specification
use '-f' to override the following errors:
/dev/label/tank6 is part of active pool 'tank'

pigpen# zpool replace -f tank label/tank6
invalid vdev specification
the following errors must be manually repaired:
/dev/label/tank6 is part of active pool 'tank'

pigpen# zpool replace -f tank da6 label/tank6
invalid vdev specification
the following errors must be manually repaired:
/dev/label/tank6 is part of active pool 'tank'

I'm running out of ideas here...

___ mailing list
To unsubscribe, send any mail to ""

Re: zpool vdev vs. glabel

2010-02-09 Thread Gerrit Kühn
On Tue, 9 Feb 2010 06:26:58 -0800 Jeremy Chadwick
 wrote about Re: zpool vdev vs. glabel:

JC> > I'm running out of ideas here...

JC> Would "zpool export" and "zpool import" be necessary in this case?

I tried that several times, does not change anything.

JC> Also, I'm a little confused as to the use of glabel in this case.  In
JC> what condition do your disk indices (e.g. X of daX) change?  Are you
JC> yanking multiple disks out of a system at the same time and then
JC> shoving them back into different drive bays?  

I just did not want to do hard-wiring da-devices in the kernel. I have two
lsi controllers, and they do not even come up in the same order every time
I boot (mpt0/mpt1), let alone the disks picking up the same daX every
time. I thought labeling the disks would be a good idea to prevent all
these kinds of problems.

JC> Are you switching
JC> between storage subsystem drivers (ahci(4) vs. ataahci(4), for
JC> example) regularly?

No (not yet al least :-).

JC> I've yet to be convinced glabel is worth bothering with, unless the
JC> system adheres to one of the above situations (which are worthy of
JC> strangulation anyway ;-) ).

I would really like to know how this happened at all... meanwhile I used a
spare disk under a different name to replace everything round-robin back
to normal.

However, I just recognized one more thing:

pigpen# zpool status tank
  pool: tank
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Tue Feb  9 15:50:01
2010 config:

tank ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
label/tank1  ONLINE   0 0 0  11K resilvered
label/tank2  ONLINE   0 0 0  10K resilvered
label/tank3  ONLINE   0 0 0  11K resilvered
label/tank4  ONLINE   0 0 0  10.5K resilvered
label/tank5  ONLINE   0 0 0  11K resilvered
label/tank6  ONLINE   0 0 0  15K resilvered

errors: No known data errors
pigpen# zpool offline tank label/tank5
pigpen# zpool status tank
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are
unaffected. action: Determine if the device needs to be replaced, and
clear the errors using 'zpool clear' or replace the device with 'zpool
replace'. see:
 scrub: resilver completed after 0h0m with 0 errors on Tue Feb  9 15:50:01
2010 config:

tank DEGRADED 0 0 0
  raidz2 DEGRADED 0 0 0
label/tank1  ONLINE   0 0 0  11K resilvered
label/tank2  ONLINE   0 0 0  10K resilvered
label/tank3  ONLINE   0 0 0  11K resilvered
label/tank4  ONLINE   0 0 0  10.5K resilvered
label/tank5  ONLINE   0 0 0  11K resilvered
label/tank6  OFFLINE  039 0  15K resilvered

errors: No known data errors

pigpen# zpool offline tank label/tank5
cannot offline label/tank5: no valid replicas

Why can't I offline a second disk? This is a raidz2 volume, after all?!

___ mailing list
To unsubscribe, send any mail to ""

Re: zpool vdev vs. glabel

2010-02-10 Thread Gerrit Kühn
On Tue, 9 Feb 2010 13:27:21 -0700 Elliot Finley 
wrote about Re: zpool vdev vs. glabel:

EF> I ran into this same problem.  you need to clean the beginning and end
EF> of your disk off before glabeling and adding it to your pool.  clean
EF> with dd if=/dev/zero...

Hm, I think I did that (at least for the beginning part).
Maybe I was not quite clear what I did below: I removed and re-attached
the *same* disk which was labelled with glabel and running fine brefore.
The label was there when I inserted it back, but zfs went for the da
device node anyway.
If I see this problem again, I will try to wipe the complete disk before
re-inserting it.

___ mailing list
To unsubscribe, send any mail to ""

Re: zpool vdev vs. glabel

2010-02-10 Thread Gerrit Kühn
On Wed, 10 Feb 2010 10:18:49 +0100 Marius Nünnerich 
wrote about Re: zpool vdev vs. glabel:

MN> It seems there is some kind of race condition with zfs either picking
MN> up the disk itself or the label device for the same disk. I guess it's
MN> which ever it probes first.

This could explain it. However, it seems that zfs sticks to the da device
once it changed it's mind. Meanwhile I discovered one more system where is
obviously has happened (although I cannot say when:

luna# zpool status
  pool: tank
 state: ONLINE
 scrub: none requested

tank  ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
label/disk-1  ONLINE   0 0 0
da0   ONLINE   0 0 0
label/disk-2  ONLINE   0 0 0
label/disk-3  ONLINE   0 0 0

errors: No known data errors

MN> I wrote the GPT part of glabel for using
MN> it in situations like this, I had not a single report of this kind of
MN> problem with the gpt labels. Maybe you can try them too?

Yeah, I just have to look into how gpt labels work. I did not use them at
all up to now.

___ mailing list
To unsubscribe, send any mail to ""

bugs in mpt(4) and mptutil(8)

2010-02-10 Thread Gerrit Kühn

I have 2 8port cards with lsi chips installed in one machine that are
driven by mpt(4). I see about the same problem (I think) when disconnecting
disks as described here:

When I simply pull a disk (without offlineing it first), zfs does not
notice it (is still listed as "online") and I get lots of

mpt1: mpt_cam_event: 0x16
mpt1: mpt_cam_event: 0x12
mpt1: mpt_cam_event: 0x16
mpt1: mpt_cam_event: 0x16
mpt1: mpt_cam_event: 0x16
mpt1: request 0xff80005e0bf0:2419 timed out for ccb 0xff0005802800
(req->ccb 0xff0005802800) mpt1: attempting to abort req
0xff80005e0bf0:2419 function 0 mpt1: completing timedout/aborted req
0xff80005e0bf0:2419 mpt1: abort of req 0xff80005e0bf0:0 completed
mpt1: request 0xff80005dc000:2810 timed out for ccb 0xff000fa66800
(req->ccb 0xff000fa66800) mpt1: request 0xff80005dc3f0:2811 timed
out for ccb 0xff0005802800 (req->ccb 0xff0005802800) mpt1:
attempting to abort req 0xff80005dc000:2810 function 0 mpt1:
completing timedout/aborted req 0xff80005dc3f0:2811 mpt1: completing
timedout/aborted req 0xff80005dc000:2810
[...goes on for ages...]

I don't know if this would ever stop. It ceased when I put the disk back
in. In the thread above it is mentioned that there are some fixes for mpt
(4) in -current to try out. However, I do not want to run -current on this
machine. So, does anyone here know how the chances are that the mentioned
patches are MFCed soon?

One more thing I noticed is that mptutil does not play well with my

pigpen# mptutil show adapter
mpt0 Adapter:
   Board Name: USASLP-L8i
   Board Assembly: USASLP-L8i
Chip Name: C1068E
Chip Revision: B3
  RAID Levels: none
mptutil: Reading config page header failed: Invalid configuration page

I don't know if it terminates because it cannot read the config page or if
it is not able to see the second card. However:

pigpen# mptutil show drives
mpt0 Physical Drives:
 da0 (  466G) ONLINE  SATA bus 0 id 0
 da1 (  466G) ONLINE  SATA bus 0 id 1
 da6 (  466G) ONLINE  SATA bus 0 id 2
da11 (  466G) ONLINE  SATA bus 0 id 3
 da3 (  466G) ONLINE  SATA bus 0 id 0
 da4 (  466G) ONLINE  SATA bus 0 id 1
 da5 (  466G) ONLINE  SATA bus 0 id 2
 da2 (   75G) ONLINE  SATA bus 0 id 3
 da7 (   75G) ONLINE  SATA bus 0 id 4
 da8 (  466G) ONLINE  SATA bus 0 id 5
 da9 (  466G) ONLINE  SATA bus 0 id 6
da10 (  466G) ONLINE  SATA bus 0 id 7
da12 ( 3824M) ONLINE  SCSI-0 bus 0 id 0

This output is definitely wrong, because the drives are split up on mpt0
and mpt1 (and the USB stick is not connected to mpt at all :-) as can be
seen with camcontrol:

pigpen# camcontrol devlist
at scbus0 target 0 lun 0 (da0,pass0)
at scbus0 target 1 lun 0 (da1,pass1)
at scbus0 target 2 lun 0 (pass6,da6)
at scbus0 target 3 lun 0 (pass11,da11)
at scbus1 target 0 lun 0 (da3,pass3)
at scbus1 target 1 lun 0 (da4,pass4)
at scbus1 target 2 lun 0 (da5,pass5)
 at scbus1 target 3 lun 0 (pass2,da2)
 at scbus1 target 4 lun 0 (da7,pass7)
at scbus1 target 5 lun 0 (da8,pass8)
at scbus1 target 6 lun 0 (da9,pass9)
at scbus1 target 7 lun 0 (da10,pass10)
  at scbus2 target 0 lun 0 (pass12,da12)

___ mailing list
To unsubscribe, send any mail to ""

Re: bugs in mpt(4) and mptutil(8)

2010-02-11 Thread Gerrit Kühn
On Wed, 10 Feb 2010 08:53:18 -0500 John Baldwin  wrote
about Re: bugs in mpt(4) and mptutil(8):

JB> > This output is definitely wrong, because the drives are split up on
JB> > mpt0 and mpt1 (and the USB stick is not connected to mpt at all :-)
JB> > as can be seen with camcontrol:

JB> Hmm, I asked the previous reporter to debug this by examining the
JB> results that CAM returns from the bus scan using gdb, but I haven't
JB> heard back. Unfortunately I do not have access to any hardware with
JB> this sort of setup to debug this.

I will do some debugging work here, if you can tell me what to do.

___ mailing list
To unsubscribe, send any mail to ""

nss_ldap and multiple group memberships

2010-02-24 Thread Gerrit Kühn
Hi all,

Is anyone here using nss_ldap and can successfully get it to work with
multiple group memberships? I would really like to get this to work here,
but I only get the primary group:

penumbra# id gekueh
uid=1030(gekueh) gid=1012(aei) groups=1012(aei)

getent group comes up with the complete group list. ldapsearch reports
three groups with member:-lines for my user. Somehow nss does not pick this
up. Any ideas?

___ mailing list
To unsubscribe, send any mail to ""

Re: nss_ldap and multiple group memberships

2010-02-25 Thread Gerrit Kühn
On Thu, 25 Feb 2010 11:17:32 +1100 "Scott, Brian"
 wrote about RE: nss_ldap and multiple group

SB> It depends on the type of group. There are at least two types of group
SB> objects that you can use in LDAP but only one of them works. You need
SB> to use posixGroup objects for unix groups. As I remember it, these
SB> have memberUid attributes for the member ids. These are simple unix
SB> identifiers. groupOfNames objects on the other hand have full
SB> distinguished names with 'member' attributes and can't be used by
SB> nss_ldap.

The server is running openldap under SLES and is not under my control.
ldapsearch gives group entries like

# lisa, group,
dn: cn=lisa,ou=group,dc=aei,dc=uni-hannover,dc=de
cn: lisa
displayName: lisa
gidNumber: 1003
member: uid=gekueh,ou=people,dc=aei,dc=uni-hannover,dc=de

So this would be the first case, I guess.

SB> The idea is that posixGroup and posixAccount mimic the unix files so
SB> extraction of the data is fast. If the software used a groupOfNames
SB> object then the returned member names would need to queried as
SB> additional transactions to find the uid's of those entries that had
SB> posixAccount information. This is because the original authentication
SB> was done by pam_ldap and that just returned a UID to the system. If it
SB> returned the LDAP distinguished name to the system, and if that could
SB> then be passed into nss_ldap it would be possible to do the LDAP query
SB> in a single transaction. But then that all breaks down if you
SB> authenticate with something else like GSSAPI. If that was the case you
SB> would need to first search for the posixAccount object of the
SB> authenticated user (&(objectClass=posixAccount)(uid=1001)) and then
SB> search for all the group of names containing that distinguished name (&
SB> (objectClass=groupOfNames)
SB> (member=uid=bscott,ou=People,dc=netlab,dc=albury,dc=tafe)). That's two
SB> transactions and seems unnecessarily wasteful. Mind you, if it was an
SB> option I'd probably turn it on.

Thanks for this fine explanation. I do not use GSS. However, I found the
following configuration option in (nss) ldap.conf that helped me:

nss_map_attribute uniqueMember member

After commenting this in, everything seems to work fine:

penumbra# id gekueh
uid=1030(gekueh) gid=1012(aei) groups=1012(aei),1003(lisa)

Maybe this could be mentioned somewhere in the documentation? I used
 to set up
the client, but the information I got from this article were rather
sparse and led me the wrong path more than once.

___ mailing list
To unsubscribe, send any mail to ""

Re: nss_ldap and multiple group memberships

2010-02-25 Thread Gerrit Kühn
On Thu, 25 Feb 2010 15:10:03 +1100 "Scott, Brian"
 wrote about RE: nss_ldap and multiple group

SB> It looks like you may need to uncomment the line '#nss_map_attribute
SB> uniqueMember member' in your ldap.conf to then use the correct
SB> attribute name.

Yes, that's exactly the solution here. I got this from reading the config
files of a working Linux client that uses the same nss libraries.

Thank you for your support!

___ mailing list
To unsubscribe, send any mail to ""

Re: em0 freezes on ZFS server

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 10:34:41 +0100 Willem Jan Withagen 
wrote about Re: em0 freezes on ZFS server:

WJW> Probably the reason why this happened yesterday is that I started
WJW> doing major software builds (over ZFS/NFS/TCP/v3) against data stored
WJW> on this box.

I saw a similar problem this morning and suppose it started when some
automatic backup jobs started last night. A unstable em device is a rather
bad thing, I hope increasing the buffer (mine is at 64000 now) prevents
this from happening again.

___ mailing list
To unsubscribe, send any mail to ""

Re: em0 freezes on ZFS server

2010-02-26 Thread Gerrit Kühn
On Thu, 25 Feb 2010 14:59:28 -0800 Jack Vogel  wrote
about Re: em0 freezes on ZFS server:

JV> The failure to "setup receive structures" means it did not have
JV> sufficient mbufs
JV> to setup the RX ring and buffer structs. 

I don't know if this is related, but I updated an amd64 zfs machine with
several em cards from 7.2 to 8-stable yesterday. First it worked fine after
booting, but this morning, at least three of the five em interfaces did
not do much anymore. You could revive them for some seconds with ifconfig
down/up, but they always ceased functioning soon after that (within
During debugging (up/down, load/unload if_em etc.) I saw the same error
message as above at some point. I finally gave up and rebooted the
machine. For now, everything appears to be back to normal (but for how

JV> Not sure why this results in a lockup, but try and increase
JV> kern.ipc.nmbclusters.

I just did that, just to make sure.

___ mailing list
To unsubscribe, send any mail to ""

Re: em0 freezes on ZFS server

2010-02-26 Thread Gerrit Kühn
On Thu, 25 Feb 2010 14:59:28 -0800 Jack Vogel  wrote
about Re: em0 freezes on ZFS server:

JV> The failure to "setup receive structures" means it did not have
JV> sufficient mbufs
JV> to setup the RX ring and buffer structs.

I'm monitoring mbufs since I rebooted my server. Right now (after 2.5 hours
or so of operation) the number of total clusters has already increased to
15k. Is this a normal behaviour for a relatively idle server or will it
inevitably go through the roof in some more hours?

Every 1s: netstat -mFri Feb 26
13:14:54 2010

15001/2279/17280 mbufs in use (current/cache/total)
13970/1212/15182/64000 mbuf clusters in use (current/cache/total/max)
13970/750 mbuf+clusters out of packet secondary zone in use (current/cache)
0/119/119/12800 4k (page size) jumbo clusters in use
(current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use
(current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use
(current/cache/total/max) 31690K/3469K/35160K bytes allocated to network
(current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf
+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
3 requests for I/O initiated by sendfile
0 calls to protocol drain routines

___ mailing list
To unsubscribe, send any mail to ""

Re: em0 freezes on ZFS server

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 04:03:39 -0800 Jeremy Chadwick
 wrote about Re: em0 freezes on ZFS server:

JC> Note how close the "current" value is to that of "total".  I'm not too
JC> surprised you're seeing what you are as a result of this.  What on
JC> earth is this machine doing at all times?

Well, speaking for my machine: serving some nfs dirs from zfs, do some
file transfers via rsync/scp, server some web pages (gitweb, redmine).
Really nothing spectacular. I just updated from 7.2 to 8-stable yesterday
and did not have that problem before. From my last email to now (about 15
minutes) mbuf clusters have increased from 15k to 18k. All my other
machines (even another one with 8-stable, but without nfs-services and
without em nics) have only a few k of buffers in use.
Is there any way I could find out what is actually using these buffers?


___ mailing list
To unsubscribe, send any mail to ""

Re: em0 freezes on ZFS server

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 13:31:38 +0100 Gerrit Kühn
 wrote about Re: em0 freezes on ZFS server:

GK> JC> Note how close the "current" value is to that of "total".  I'm not
GK> JC> too surprised you're seeing what you are as a result of this.
GK> JC> What on earth is this machine doing at all times?

GK> Is there any way I could find out what is actually using these buffers?

Sorry for replying to my own email:
At least in my case I found out what is eating the buffers: nfsd does!
The buffers stop increasing as soon as I stop nfsd. However, they start
increasing as soon as I start nfsd again.
Are there any ideas how to fix this? Downgrading back to 7-stable is not
really an easy task as far as I know, and I need the server to run without
having to reboot it once for twice a day...

___ mailing list
To unsubscribe, send any mail to ""

Re: em0 freezes on ZFS server

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 15:04:37 +0200 Daniel Braniss 
wrote about Re: em0 freezes on ZFS server :

DB> > At least in my case I found out what is eating the buffers: nfsd
DB> > does! The buffers stop increasing as soon as I stop nfsd. However,
DB> > they start increasing as soon as I start nfsd again.
DB> > Are there any ideas how to fix this? Downgrading back to 7-stable is
DB> > not really an easy task as far as I know, and I need the server to
DB> > run without having to reboot it once for twice a day...

DB> I want to add some spices to this stew: :-)

You're welcome. :-)

DB> Some few day later it hung, and it's now hanging every few days.
DB> Most of the hangs are because there is no network, but the NIC is bce
DB> not em! I doubled kern.ipc.nmbclusters and lets see what happens ...

Do you have nfsd running and serving clients? If so, we should maybe
change the topic to something like "possible nfs mbuf leakage"...

DB> 23066/6634/29700 mbufs in use (current/cache/total)

My server is at 22k now, and the buffer number is still increasing every
few seconds...
Can you monitor your mbuf usage and report if it grows?

___ mailing list
To unsubscribe, send any mail to ""

Re: em0 freezes on ZFS server

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 17:07:13 +0200 Daniel Braniss 
wrote about Re: em0 freezes on ZFS server :

DB> it's only purpose in life is a nfs server.

I thought so, but you did not mention it explicitely.

DB> but I wouldn't exclude zfs from the equation yet.
DB> I have othere nfs servers, not doing zfs and I don't see this.

My machine has zfs, too. I do not have 8-stable with nfs on ufs, so I
cannot crosscheck that.

DB> > My server is at 22k now, and the buffer number is still increasing
DB> > every few seconds...
DB> > Can you monitor your mbuf usage and report if it grows?

DB> I am, and in the last 2hs. it grew by about 300, it does oscilate,
DB> i.e. it grows some, then
DB> it goes down, but it seems that the low always increases.

Mine is at 36k now:

36797/3403/40200 mbufs in use (current/cache/total)
35772/1202/36974/65000 mbuf clusters in use (current/cache/total/max)
35772/836 mbuf+clusters out of packet secondary zone in use (current/cache)

DB> when I have enough data i'll plot it.

I think I'll reboot my machine now and hope that it lives as long as
possible into the weekend. Although at the present rate it will not
survive 24h. :-(

___ mailing list
To unsubscribe, send any mail to ""

mbuf leakage with nfs/zfs? (was: em0 freezes on ZFS server)

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 17:41:02 +0200 Daniel Braniss 
wrote about Re: em0 freezes on ZFS server :

DB> check:
DB> x is seconds, y is mbus current.

Looks not as bad as mine. I had 37k when I rebooted the machine some
minutes ago (and it's basically idle, just serving a few nfs clients that
don't do much).
But from the values Jeremy has posted and from my own comparsisons here I
would think that something like 5k of mbuf clusters would be normal for my
machine (and probably also for yours).

Some more info from my side:
In the meantime I also tried a different network interface. The
nfe-interface that is onboard causes the same problems, so it is probably
not an em-specific issue.
Furthermore I found this via Google:
I patched and recompiled my kernel with this, just to try it out. Right
now I have

2264/1321/3585 mbufs in use (current/cache/total)
1239/1017/2256/65000 mbuf clusters in use (current/cache/total/max)
1239/809 mbuf+clusters out of packet secondary zone in use (current/cache)

but the uptime is only 12min so far. In some hours I'll know for certain
if this patch has anything to do with the problem.

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs? (was: em0 freezes on ZFS server)

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 22:09:32 +0200 Daniel Braniss 
wrote about Re: mbuf leakage with nfs/zfs? (was: em0 freezes on ZFS
server) :

DB> > Furthermore I found this via Google:
DB> > 

This did not help, I still see the same problem.

DB> I'll have to do some packet snooping to check if it's TCP or UDP nfs
DB> traffic, since some of the clients are Linux ...

I have Linux clients, too. Some use tcp, some udp.

DB> > 2264/1321/3585 mbufs in use (current/cache/total)
DB> > 1239/1017/2256/65000 mbuf clusters in use (current/cache/total/max)
DB> > 1239/809 mbuf+clusters out of packet secondary zone in use
DB> > (current/cache)

DB> > but the uptime is only 12min so far. In some hours I'll know for
DB> > certain if this patch has anything to do with the problem.

It did not help. In the meantime the values read

20555/1465/22020 mbufs in use (current/cache/total)
19529/1029/20558/65000 mbuf clusters in use (current/cache/total/max)
19529/823 mbuf+clusters out of packet secondary zone in use (current/cache)

I created a little graph here:

y-axis are the total mbuf clusters, x-axis in minutes. The flat part in
the upper right corner is a 10min-interval when I had stopped nfsd.

DB> at the moment there is not much activity, but if you check the latest
DB> you will see that the bottom is slowly increasing, so my bet
DB> is that there must be some leakage!

There certainly is. I wonder when this came in and why it has gone
unnoticed so far. Probably not all people serving nfs from zfs see this,
or this would have popped up earlier. Maybe the Linux clients are somehow
triggering the issue? Or did it start with the import of zvol version 14?
Unfortunately I have upgraded my pool, so I cannot easily go back to 8-REL
to test this (otoh, I need a stable server quite urgently).

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs? (was: em0 freezes on ZFS server)

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 22:09:32 +0200 Daniel Braniss 
wrote about Re: mbuf leakage with nfs/zfs? (was: em0 freezes on ZFS
server) :

DB> at the moment there is not much activity, but if you check the latest
DB> you will see that the bottom is slowly increasing, so my bet
DB> is that there must be some leakage!

BTW: I filed a PR for this:

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 23:12:39 +0100 Willem Jan Withagen 
wrote about Re: mbuf leakage with nfs/zfs?:

WJW> Mine are now:
WJW> 41533/2402/43935 mbufs in use (current/cache/total)
WJW> 41454/1572/43026/262144 mbuf clusters in use (current/cache/total/max)
WJW> 39241/823 mbuf+clusters out of packet secondary zone in use
WJW> (current/cache)

81492/2613/84105 mbufs in use (current/cache/total)
80467/2235/82702/128000 mbuf clusters in use (current/cache/total/max)
80458/822 mbuf+clusters out of packet secondary zone in use (current/cache)

If I keep increasing the clusters, maybe I can make it over the
weekend. :-)

WJW> ', I did set the zvol version this morning also to 14 but I think 
WJW> that I ran into trouble already when still running version 13.

Ok, so this is possibly ruled out, too. Maybe the Linux clients do
something weird?

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-26 Thread Gerrit Kühn
On Fri, 26 Feb 2010 23:12:39 +0100 Willem Jan Withagen 
wrote about Re: mbuf leakage with nfs/zfs?:

WJW> > DB>  I'll have to do some packet snooping to check if it's TCP or
WJW> > DB> UDP nfs traffic, since some of the clients are Linux ...

WJW> > I have Linux clients, too. Some use tcp, some udp.

WJW> I have Linux and FreeBSD clients running. The build system runs on 
WJW> Linux. All Linux's are UDP

Another shot in the dark:
After upgrading the server, all my Linux clients hang with "stale nfs
dir/file handle/whatever". I was not able to umount them (not even
forcefully). I had to use either lazy forceful umount (-fl) or reboot. Some
of these clients are still hanging around, because they are physically
hard to access (clean room installs etc.). Maybe these clients still try to
establish connections that eat up the buffers and never come back?

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 09:24:10 +0200 Daniel Braniss 
wrote about Re: mbuf leakage with nfs/zfs? :

DB> I doubt it, but here is another shot:
DB> are we all running samba? I'm asking because the lock manager keeps
DB> dying and ...

Nope, no samba on my side. I am running lockd and statd on the server, but
stoppeing them does not change anything. All clients are using option
nolock anyway.

DB> PS: I dropped Jack from the CC, I think em is innocent :-)

Yes, good idea.

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 11:14:56 +0200 Daniel Braniss 
wrote about Re: mbuf leakage with nfs/zfs? :

DB> anyways, I am running tests on an 'unused' server, only me using it to
DB> 'make world'
DB> and it's leaking.

Hm, I've got a server with 8-PRE from somewhen in Nov09 that is serving
nfs from zfs fine and shows no leakage...

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 12:26:02 +0200 Daniel Braniss 
wrote about Re: mbuf leakage with nfs/zfs? :

DB> > Hm, I've got a server with 8-PRE from somewhen in Nov09 that is
DB> > serving nfs from zfs fine and shows no leakage...

DB> the binary search has started!
DB> sorry, have to go know :-) [realy], but should be back in a couple of
DB> hours, let me know if you managed to pin it down, else I can continue.

Sorry, but I cannot do much over the weekend. Both the machine with leakage
and the one without are in production (and about 40km apart from each
other and away from my home :-).
I still wonder if there are more circumstances needed to provoke this
problem. I really doubt that this would have gone unnoticed for weeks or
even months if it only takes some nfs-server serving from zfs storage and
some client to see it.
What does the client in your test setup look like?

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 15:15:52 +0100 Willem Jan Withagen 
wrote about Re: mbuf leakage with nfs/zfs?:

WJW> > 81492/2613/84105 mbufs in use (current/cache/total)
WJW> > 80467/2235/82702/128000 mbuf clusters in use
WJW> > (current/cache/total/max) 80458/822 mbuf+clusters out of packet
WJW> > secondary zone in use (current/cache)

WJW> Over the night I only had rsync and FreeBSD nfs traffic.
WJW> 45337/2828/48165 mbufs in use (current/cache/total)
WJW> 44708/1902/46610/262144 mbuf clusters in use (current/cache/total/max)
WJW> 44040/888 mbuf+clusters out of packet secondary zone in use
WJW> (current/cache)

After about 24h I now have

128320/2630/130950 mbufs in use (current/cache/total)
127294/1200/128494/512000 mbuf clusters in use (current/cache/total/max)
127294/834 mbuf+clusters out of packet secondary zone in use

WJW> I only have one Linux box runing Kubuntu 8.10, mounted UDP: 
WJW> (rw,udp,nolock,rsize=32768,wsize=32768,intr)

Hm, are you able to narrow this down? Does a single Linux client with tcp
mount cause the same trouble? Or a FreeBSD client with udp?
If it was only Linux clients with udp mounts or something like this, I
could understand why it took some time to pop up.

WJW> But running something like 'find openembedded | xarg cat > /dev/null'
WJW> Shows a steadily growing number of mbufs, and letting the system sit
WJW> for 5 min. doesn't decrease the used mbufs

I still have several udp and tcp mounts by Linux clients on my Server,
though most of them are probably stale now after the upgrade; and my
buffers keep draining...

WJW> Doing this on another FreeBSD 7.2 client runs the mbufs up(max inc
WJW> about 2000 mbuf), but within a few secs after the last file was
WJW> fetched, the mbuf tab runs down to around to what is was before the
WJW> command.

FreeBSD client with udp mount? Then it is either Linux client with udp or
all Linux clients triggering this leakage. I doubt that this is the case
with all Linux clients, this would have caused more trouble earlier.

WJW> Not shure where to go from here? I'm certainly not fluent enough in
WJW> NFS to start interpreting a wireshark trace.

Nor do I.
I already wrote Rick Macklem an Email on Friday, but so far only got back
an automated reply stating he is on "permanent vacation". I guess we need
him or one of the other nfs guys to get this fixed.
Could you try a single Linux client with tcp mount in the meantime? This
would tell us if Linux clients as such are causing the issue, or if it is
only Linux with udp mount.


P.S.: I cc'ed freebsd-fs because my PR went there.
___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 12:26:02 +0200 Daniel Braniss 
wrote about Re: mbuf leakage with nfs/zfs? :

DB> > Hm, I've got a server with 8-PRE from somewhen in Nov09 that is
DB> > serving nfs from zfs fine and shows no leakage...

DB> the binary search has started!

After considering the last email from Willem: My 8-PRE server does not
have udp Linux clients, only Linux with tcp. If indeed Linux with udp is
causing the problem, it may very well even be in 8-PRE, and I just did not
see it so far.

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 21:32:39 +0100 Eirik Øverby  wrote
about Re: mbuf leakage with nfs/zfs?:

E> I've had a discussion with some folks on this for a while. I can easily
E> reproduce this situation by mounting a FreeBSD ZFS filesystem via
E> NFS-UDP from an OpenBSD machine. Telling the OpenBSD machine to use TCP
E> instead of UDP makes the problem go away.

So we see this problem with udp clients from OpenBSD and Linux.

E> Other FreeBSD systems mounting the same share, either using UDP or TCP,
E> does not cause the problem to show up.

As Daniel reported he saw the problem with FBSD 8-stable: Which version
was the FBSD-client that worked for you with udp?

E> A patch was suggested by Rick Macklem, but that did not solve the issue:

Yeah, I also found and tried this on Friday - unfortunately without any
success, the leakage is still there.

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 21:36:47 +0200 Daniel Braniss 
wrote about Re: mbuf leakage with nfs/zfs? :

DB> I have been running for the last few hours, 8-rel, and the only client
DB> is another
DB> 8-stable, furthermore, no ZFS, just plain UFS, and the leak is there!

Mounted via udp, not tcp, I guess...?!

DB> I am now trying 8-rc2 but will check in the morning, it is after all
DB> saturday night :-)

Same here. :-)

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 11:38:19 -0800 Jeremy Chadwick
 wrote about Re: mbuf leakage with nfs/zfs?:

JC> I should point out that the NFS+ZFS-based filer doesn't actually do its
JC> backups using NFS; it uses rsnapshot (rsync) over SSH.  There is
JC> intense network I/O during backup time though, depending on how much
JC> data there is to back up.  The NFS mounts (on the clients) are only
JC> used to provide a way for people to get access to their nightly
JC> backups in a convenient way; it isn't used very heavily.

That's rather similar to my situation, I would say. Most traffic goes via
rsync, nfs only gives access to home dirs, which are not intensively used.

JC> I can do something NFS-intensive on any of the above clients if people
JC> want me to kind of testing.  Possibly an rsync with a source of the NFS
JC> mount and a destination of the local disk would be a good test?  Let me
JC> know if anyone's interested in me testing that.

>From the last emails I would say we get most out of it by comparing tcp
and udp clients to make sure this happens only with udp (and it is still
not quite clear to me if it also happens with a FBSD client using udp).

OTOH it would be great if someone with the ability to actually fix
something in the nfs code could get in this discussion to guide us to do
the debugging needed to do so.

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/zfs?

2010-02-27 Thread Gerrit Kühn
On Sat, 27 Feb 2010 22:40:43 +0100 Eirik Øverby  wrote
about Re: mbuf leakage with nfs/zfs?:

E> > So we see this problem with udp clients from OpenBSD and Linux.

E> I have not had the opportunity to test with Linux or anything else.

I guess all others who reported so far (including me) had Linux on the
client side.

E> Could try from Windows, but not sure I want to get my hands THAT dirty.


E> > As Daniel reported he saw the problem with FBSD 8-stable: Which
E> > version was the FBSD-client that worked for you with udp?

E> 7.1, 7.2, 8.0-RCsomething and 8.0-RELEASE - no problems with either.

Daniel, are you sure you had the leakage with 8-stable? Eirik, do you have
the opportunity to try 8-stable with udp?

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs

2010-02-28 Thread Gerrit Kühn
On Sun, 28 Feb 2010 12:21:28 + "Robert N. M. Watson"
 wrote about Re: mbuf leakage with nfs/zfs? :

RNMW> It's almost certainly one or a small number of very specific RPCs
RNMW> that are triggering it -- maybe OpenBSD does an extra lookup, or
RNMW> stat, or something, on a name that may not exist anymore, or does it
RNMW> sooner than the other clients. Hard to say, other than to wave hands
RNMW> at the possibilities.
RNMW> And it may well be we're looking at two bugs: Danny may see one bug,
RNMW> perhaps triggered by a race condition, but it may be different from
RNMW> the OpenBSD client-triggered bug (to be clear: it's definitely a
RNMW> FreeBSD bug, although we might only see it when an OpenBSD client is
RNMW> used because perhaps OpenBSD also has a bug or feature).

In my case it is the Linux client causing the problems (cannot tell yet if
it is only with udp, but I would think so). If I understand Daniel
correctly his latest testes were performed with FreeBSD client and udp. So
it may very well be a generel issue with udp?! Would this help narrowing
down the problem?

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs/udp (was: mbuf leakage with nfs/zfs)

2010-02-28 Thread Gerrit Kühn
On Sun, 28 Feb 2010 16:52:44 +0200 Daniel Braniss 
wrote about Re: mbuf leakage with nfs/udp (was: mbuf leakage with nfs/zfs):

DB> well, I have further reduced the problem, it happens with NFS/UDP
DB> writes. i'll try the wireshark road, but i'm very rusty with RPC, the
DB> other road is to check the changes, my oldest is from late october
DB> (RC2) where it's happening, while
DB> Gerrit tried 8-pre from November and worked, so it will be fun
DB> trying to nail it down :-)

I already withdrew from this position yesterday, because the 8-PRE server
I have does not have udp clients, only tcp. So I cannot tell (yet) wether
it is affected by the leakage or not.

___ mailing list
To unsubscribe, send any mail to ""

Re: mbuf leakage with nfs

2010-03-01 Thread Gerrit Kühn
On Mon, 01 Mar 2010 12:52:32 +0100 Willem Jan Withagen 
wrote about Re: mbuf leakage with nfs:

WJW> > In my case it is the Linux client causing the problems (cannot tell
WJW> > yet if it is only with udp, but I would think so). If I understand
WJW> > Daniel correctly his latest testes were performed with FreeBSD
WJW> > client and udp. So it may very well be a generel issue with udp?!
WJW> > Would this help narrowing down the problem?
WJW> I'm off 'till thursday.
WJW> At which time I'm willing to run more tests. Got plenty of boxes here.
WJW> Both FreeBSD and Linux. And otherwise will boot more in VirtualBox.

I finally too an axe and restarted nfsd without "-u". Now my mbuf usage is
flat as it should be. I guess some people using computers with udp
mounts will complian, but this can be fixed easily by converting their
connections to tcp.
However, I am still interested in having the issue fixed, so I will be
following the thread and contribute if possible.

___ mailing list
To unsubscribe, send any mail to ""

T7200 CPU not detected by est

2008-01-21 Thread Gerrit Kühn
Hi folks,

I have several systems using T7200 mobile CPUs running under 7-stable.
However, EST does not recognize the cpus. When loading cpufreq I get:

Jan 18 23:18:14 comet kernel: est1: 
on cpu1 
Jan 18 23:18:14 comet kernel: est: CPU supports Enhanced
Speedstep, but is not recognized. 
Jan 18 23:18:14 comet kernel: est: cpu_vendor GenuineIntel, msr
Jan 18 23:18:14 comet kernel: device_attach: est1 attach returned 6 
Jan 18 23:18:14 comet kernel: p4tcc0:  on
Jan 18 23:18:14 comet kernel: est1:  on cpu1 
Jan 18 23:18:14 comet kernel: est: CPU supports Enhanced
Control> Speedstep, but is not recognized. 
Jan 18 23:18:14 comet kernel: Control> est: cpu_vendor GenuineIntel,
msr 6130c2906000c29 
Jan 18 23:18:14 comet kernel: device_attach: est1 attach returned 6 
Jan 18 23:18:14 comet kernel: p4tcc1:  on
Jan 18 23:18:14 comet kernel: est1:  on cpu1 
Jan 18 23:18:14 comet kernel: est: CPU supports Enhanced Control>
Speedstep, but is not recognized. 
Jan 18 23:18:14 comet kernel: est: cpu_vendor GenuineIntel, msr
Jan 18 23:18:14 comet kernel: device_attach: est1 attach returned 6 
Jan 18 23:18:14 comet kernel: est1: 
on cpu1 
Jan 18 23:18:14 comet kernel: est: CPU supports Enhanced Speedstep,
but is not recognized. 
Jan 18 23:18:14 comet kernel: est: cpu_vendor GenuineIntel, msr
6130c2906000c29 Jan 18 23:18:14 comet kernel: device_attach: est1 attach
returned 6

Here is some (hopefully useful :-) excerpt from my dmesg:

Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-BETA4 #0: Fri Dec 14 21:02:47 CET 2007
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/COMET.7
can't re-use a leaf (siots)!
can't re-use a leaf (conspeed)!
can't re-use a leaf (gdbspeed)!
can't re-use a leaf (conrclk)!
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 CPU T7200  @ 2.00GHz (1999.00-MHz
686-class CPU) Origin = "GenuineIntel"  Id = 0x6f6  Stepping = 6
  AMD Features=0x2010
  AMD Features2=0x1
  Cores per package: 2
real memory  = 2137915392 (2038 MB)
avail memory = 2086670336 (1990 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
pnpbios: Bad PnP BIOS data checksum
ioapic0: Changing APIC ID to 2
ioapic0  irqs 0-23 on motherboard
acpi0:  on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a (3) failed
acpi0: reservation of 10, 7f5e (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0:  on acpi0
acpi_perf0:  on cpu0
cpu1:  on acpi0
acpi_button0:  on acpi0
SMP: AP CPU #1 Launched!

When I start powerd with cpufreq loaded like this, the machines
typically crash and reboot. I searched the web for a while to find a
solution for this, but without any success. Does anybody here have some
hints how to get speedstepping & co. to work properly?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: T7200 CPU not detected by est

2008-01-22 Thread Gerrit Kühn
On Mon, 21 Jan 2008 09:01:02 -0800 Jeremy Chadwick <[EMAIL PROTECTED]>
wrote about Re: T7200 CPU not detected by est:

JC> > Jan 18 23:18:14 comet kernel: est1:  > Control> on cpu1 Jan 18 23:18:14 comet kernel: est: CPU supports
JC> > Control> Enhanced Speedstep, but is not recognized. 
JC> > Jan 18 23:18:14 comet kernel: est: cpu_vendor GenuineIntel, msr
JC> > 6130c2906000c29 Jan 18 23:18:14 comet kernel: device_attach: est1
JC> > attach returned 6 

JC> I see identical behaviour on our Supermicro PDSMI+ systems, using E6420
JC> CPUs, so I don't believe the problem is specific to your motherboard or
JC> certain Intel CPU models:

It is definitely not bound to certain mainboards, because I see it on
several different ones. However, I have only T-series CPUs to test.

JC> CPU: Intel(R) Core(TM)2 CPU  6420  @ 2.13GHz (2128.01-MHz
JC> 686-class CPU) acpi0:  on motherboard
JC> acpi0: [ITHREAD]
JC> acpi0: Power Button (fixed)
JC> cpu0:  on acpi0
JC> est0:  on cpu0
JC> est: CPU supports Enhanced Speedstep, but is not recognized.
JC> est: cpu_vendor GenuineIntel, msr 82a082a0600082a
JC> device_attach: est0 attach returned 6
JC> cpu1:  on acpi0
JC> est1:  on cpu1
JC> est: CPU supports Enhanced Speedstep, but is not recognized.
JC> est: cpu_vendor GenuineIntel, msr 82a082a0600082a
JC> device_attach: est1 attach returned 6

Ok, so it's probably neither specific for CPUs nor for the mainbaords;
however, up to now all CPUs with this problem are Core2 CPUs.

JC> In the case of our servers, we usually turn EIST off (this one
JC> particular box has it enabled) because of the above problem -- but I'd
JC> much rather have it turned on to help save power.  For a laptop or
JC> workstation, however, I can see this being an incredibly important
JC> feature.

I run low power workstations here which are intended to be used in a lab
environment, and I would very much like to have working power-saving
features (otherwise the whole setup is quite useless).

Can I somehow help debugging this, should I file a PR or are there any
further recommended things to do?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: T7200 CPU not detected by est

2008-01-22 Thread Gerrit Kühn
On Tue, 22 Jan 2008 05:47:25 -0800 Jeremy Chadwick <[EMAIL PROTECTED]>
wrote about Re: T7200 CPU not detected by est:

JC> And I can tell the system is significantly "slower" when idle, which is
JC> normal.  :-)

JC> So give that a try...

First of all, thank you very much for your work and your mail.
Surprisingly (at least to me :-) everything seems to work as you said with
the system on my desk here, which has a T7200 and FreeBSD 6.3-RC2.
However, I know I already tried this with my system at home with a T7200
and FreeBSD 7.0-Beta4, and it crashed and rebooted when starting powerd.
Just to make sure, I will try again and report about it when I am back at
Meanwhile, I also opened a PR under #119895.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

src.conf, WITHOUT-knobs and nanobsd

2008-02-01 Thread Gerrit Kühn
Hi folks,

I have just noticed that the NO-knobs were renamed into WITHOUT-knobs and
moved from make.conf into src.conf.
Can anyone tell me if (and how) this interacts with nanobsd, which uses
the following variables:

# Options to put in make.conf during buildworld only

# Options to put in make.conf during installworld only

# Options to put in make.conf during both build- & installworld.

Can I put the new WITHOUT-knobs there like I used to do with the NO-knobs
before, or is this broken, because the definitions will still go into
make.conf instead of the new src.conf?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: broken re(4)

2008-05-28 Thread Gerrit Kühn
On Wed, 28 May 2008 09:28:23 +0900 Pyun YongHyeon <[EMAIL PROTECTED]> wrote
about Re: broken re(4):

PY>  > Any hints what I should do next to find the culprit?

PY> There were similiar reports on this issue. It seems that it's very
PY> hard to make re(4) work so many RTL8168/8169/8111 revisions without
PY> documentation as different revisions require different workaround.

I know. However, in this case I think I have identical hardware, but two
boards work, and one doesn't (which seems very strange to me).

PY> Anyway, would you try this one? The patch was generated against HEAD
PY> but it would apply to STABLE too.

Thanks. I applied the patch and a new nanobsd image is build right now. I
will report later today about the results.

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: broken re(4)

2008-05-28 Thread Gerrit Kühn
On Wed, 28 May 2008 09:28:23 +0900 Pyun YongHyeon <[EMAIL PROTECTED]> wrote
about Re: broken re(4):

PY>  > Any hints what I should do next to find the culprit?

PY> There were similiar reports on this issue. It seems that it's very
PY> hard to make re(4) work so many RTL8168/8169/8111 revisions without
PY> documentation as different revisions require different workaround.
PY> Anyway, would you try this one? The patch was generated against HEAD
PY> but it would apply to STABLE too.

Well, I tested this one with some success:
After booting the patched system, my re0-Interface was working fine, at
least I could transmit GBs of data without getting the problems mentioned
before. :-) 
However, on the other hand re1 did not work at all :-(. The interface was
up and running, has an IP and everything, but I could not get a single
packet from or to it.
After trying everything I could think of, I rebooted the machine. Then
both interfaces were working again. Therefore, the errors described above
when transmitting large amounts of data are back. :-(

Somehow the two interfaces seem to interfer with each other. Can I provide
further information for fixing this?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: broken re(4)

2008-05-29 Thread Gerrit Kühn
On Tue, 27 May 2008 11:45:19 -0400 Michael Proto <[EMAIL PROTECTED]>
wrote about Re: broken re(4):

MP> > Any hints what I should do next to find the culprit?

MP> I'm running 6.3 on the exact same Jetway board at home, and while I
MP> haven't been bitten by the DOWN/UP issue I have seen the occasional
MP> "corrupted MAC on input" error when doing an ssh/scp. Seems to have
MP> simmered-down since moving from 6.3-RELEASE to 6.3-STABLE (last
MP> supped/rebuilt on 5/6/08).

MP> Note this is using only one of the 2 on-board NICs. I disabled the 2nd
MP> one in the BIOS as I don't need it at the moment.

After my experiences with the patch Pyun provided is seems to me that
running one or two nics makes somehow a difference.
I am still wondering why I have two boards with the same hardware that
work flawlessly.
However, when comparing all things, I found some minor differences:
The working boards are some months older and their CPU shows up like this:

CPU: VIA C7 Esther+RNG+AES+AES-CTR+SHA1+SHA256+RSA (1500.01-MHz 686-class
CPU) Origin = "CentaurHauls"  Id = 0x6a9  Stepping = 9

The one that doesn't work is newer and dmesg identifies:

CPU: VIA C7 Processor 1500MHz (1500.01-MHz 686-class CPU)
  Origin = "CentaurHauls"  Id = 0x6d0  Stepping = 0
  VIA Padlock Features=0xffcc

This is the only difference I can make out so far. Which board do you
exactly have?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: broken re(4)

2008-05-29 Thread Gerrit Kühn
On Wed, 28 May 2008 17:56:10 +0200 Gerrit Kühn
<[EMAIL PROTECTED]> wrote about Re: broken re(4):


GK> Somehow the two interfaces seem to interfer with each other. Can I
GK> provide further information for fixing this?

Meanwhile I booted the machine with the patch several times. I get either
the same status as without the patch (both interfaces basically working,
but with hangs and checksum errors under load) or I get one interface
working fine and one not working at all.
I tried turning one if off in the bios, but this didn't change the
situation: The remaining one is either working with problems or not at all.

Can I do anything else? Is the newer patch (from yesterday) in your
directory above worth giving a try?

___ mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

  1   2   >