Re: Using iscsi with multiple targets

2008-07-15 Thread Sven Willenberger
On Mon, 2008-07-14 at 11:29 +0300, Danny Braniss wrote:
> > FreeBSD 7.0
> > 
> > I have 2 machines with identical configurations/hardware, let's call them A 
> > (master) 
> > and B (slave). I have installed iscsi-target from ports and have set up 3 
> > targets 
> > representing the 3 drives I wish to be connected to from A.
> > 
> > The Targets file:
> > # extents   filestart   length
> > extent0 /dev/da10   465GB
> > extent1 /dev/da20   465GB
> > extent2 /dev/da30   465GB
> > 
> > # targetflags   storage netmask
> > target0 rw  extent0 192.168.0.1/24
> > target1 rw  extent1 192.168.0.1/24
> > target2 rw  extent2 192.168.0.1/24
> > 
> > I then start up iscsi_target and all is good.
> > 
> > Now on A I have set up my /etc/iscsi.conf file as follows:
> > 
> > # cat /etc/iscsi.conf
> > data1 {
> >  targetaddress=192.168.0.252
> >  targetname=iqn.1994-04.org.netbsd.iscsi-target:target0
> >  initiatorname=iqn.2005-01.il.ac.huji.cs::BSD-2-1.sven.local
> > }
> > data2 {
> >  targetaddress=192.168.0.252
> >  targetname=iqn.1994-04.org.netbsd.iscsi-target:target1
> >  initiatorname=iqn.2005-01.il.ac.huji.cs::BSD-2-1.sven.local
> > }
> > data3 {
> >  targetaddress=192.168.0.252
> >  targetname=iqn.1994-04.org.netbsd.iscsi-target:target2
> >  initiatorname=iqn.2005-01.il.ac.huji.cs::BSD-2-1.sven.local
> > }
> > 
> > So far so good, now come the issues. First of all, it would appear that 
> > with 
> > iscontrol one can only start one "named" session at a time; for example
> > /sbin/iscontrol -n data1
> > /sbin/iscontrol -n data2
> > /sbin/isconrtol -n data3
> > 
> > I guess that is ok, except that each invocation of iscontrol resets the 
> > other 
> > sessions. Here is the camcontrol and dmesg output from running the above 3 
> > commands.
> > 
> > # camcontrol devlist
> > at scbus0 target 0 lun 0 (pass0,da0)
> > at scbus0 target 1 lun 0 (pass1,da1)
> > at scbus0 target 2 lun 0 (pass2,da2)
> > at scbus0 target 3 lun 0 (pass3,da3)
> > at scbus1 target 0 lun 0 (da5,pass5)
> > at scbus1 target 1 lun 0 (da6,pass6)
> > at scbus1 target 2 lun 0 (da4,pass4)
> > 
> > 
> > [ /sbin/iscontrol -n data1 ]
> > da4 at iscsi0 bus 0 target 0 lun 0
> > da4:  Fixed Direct Access SCSI-3 device
> > 
> > [ /sbin/iscontrol -n data2 ]
> > (da4:iscsi0:0:0:0): lost device
> > (da4:iscsi0:0:0:0): removing device entry
> > da4 at iscsi0 bus 0 target 0 lun 0
> > da4:  Fixed Direct Access SCSI-3 device
> > da5 at iscsi0 bus 0 target 1 lun 0
> > da5:  Fixed Direct Access SCSI-3 device
> > 
> > [ /sbin/iscontrol -n data3 ]
> > (da4:iscsi0:0:0:0): lost device
> > (da4:iscsi0:0:0:0): removing device entry
> > (da5:iscsi0:0:1:0): lost device
> > (da5:iscsi0:0:1:0): removing device entry
> > da4 at iscsi0 bus 0 target 2 lun 0
> > da4:  Fixed Direct Access SCSI-3 device
> > da5 at iscsi0 bus 0 target 0 lun 0
> > da5:  Fixed Direct Access SCSI-3 device
> > da6 at iscsi0 bus 0 target 1 lun 0
> > da6:  Fixed Direct Access SCSI-3 device
> > 
> > 
> > It would appear that rather than appending the new device to the end of the 
> > "da" 
> > devices, it starts to do some type of naming queue after the second device. 
> > If I am 
> > to use these devices in any type of automated setup, how can make sure that 
> > after 
> > these commands, "da6" will always be target 1 (i.e. /dev/da2 on the slave 
> > machine).
> > 
> > Next, there is no "startup" script for iscontrol - would that simply have 
> > to be 
> > added the system or is there a way with sysctl that it could be done. The 
> > plan here 
> > is use gmirror such that /dev/da1 on A is mirrored with the /dev/da1 on B 
> > using iscsi.
> 
> Hi Sven,
>   I just tried it here, and it seems that at the end all is ok :-)
> I think the lost/removing/found has something to do to iscontrol calling
> camcontrol rescan - I will check this later, but the end result is that
> you should have all /dev/da's.
>   I don't see any reasonable safe way to tie a scsi# (/dev/dan),
> except to label (see glabel) the disk.
>   The startup script is, at the moment, not trivial, but I'm attaching
> it, so someone can suggest improvements :-)
> #!/bin/sh
> 
> # PROVIDE: iscsi
> # REQUIRE: NETWORKING
> # BEFORE:  DAEMON
> # KEYWORD: nojail shutdown
> 
> #
> # Add the following lines to /etc/rc.conf to enable iscsi:
> #
> # iscsi_enable="YES"
> # iscsi_fstab="/etc/fstab.iscsi"
> 
> . /etc/rc.subr
> . /cs/share/etc/rc.subr
> 
> name=iscsi
> rcvar=`set_rcvar`
> 
> command=/sbin/iscontrol
> 
> iscsi_enable=${iscsi_enable:-"NO"}
> iscsi_fstab=${iscsi_fstab:-"/etc/fstab.iscsi"}
> iscsi_exports=${iscsi_exports:-"/etc/exports.iscsi"}
> iscsi_debug=${iscsi_debug:-0}
> start_cmd="iscsi_start"
> faststop_cmp

Multi-machine mirroring choices

2008-07-15 Thread Sven Willenberger
With the introduction of zfs to FreeBSD 7.0, a door has opened for more
mirroring options so I would like to get some opinions on what direction
I should take for the following scenario.

Basically I have 2 machines that are "clones" of each other (master and
slave) wherein one will be serving up samba shares. Each server has one
disk to hold the OS (not mirrored) and then 3 disks, each of which will
be its own mountpoint and samba share. The idea is to create a mirror of
each of these disks on the slave machine so that in the event the master
goes down, the slave can pick up serving the samba shares (I am using
CARP as the samba server IP address).

My initial thought was to have the slave set up as an iscsi target and
then have the master connect to each drive, then create a gmirror or
zpool mirror using local_data1:iscsi_data1, local_data2:iscsi_data2, and
local_data3:iscsi_data3. After some feedback (P.French for example) it
would appear as though iscsi may not be the way to go for this as it
locks up when the target goes down and even though I may be able to
remove the target from the mirror, that process may fail as the "disk"
remains in "D" state.

So that leaves me with the following options:
1) ggated/ggatec + gmirror
2) ggated/ggatec + zfs (zpool mirror)
3) zfs send/recv incremental snapshots (ssh)

1) I have been using ggated/ggatec on a set of 6.2-REL boxes and find
that ggated tends to fail after some time leaving me rebuilding the
mirror periodically (and gmirror resilvering takes quite some time). Has
ggated/ggatec performance and stability improved in 7.0? This
combination does work, but it is high maintenance and automating it is a
bit painful (in terms of re-establishing the gmirror and rebuilding and
making sure the master machine is the one being read from).

2) Noting the issues with ggated/ggatec in (1), would a zpool be better
at rebuilding the mirror? I understand that it can better determine
which drive of the mirror is out of sync than can gmirror so a lot of
the "insert" "rebuild" manipulations used with gmirror would not be
needed here.

3) The send/recv feature of zfs was something I had not even considered
until very recently. My understanding is that this would work by a)
taking a snapshot of master_data1 b) zfs sending that snapshot to
slave_data1 c) via ssh on pipe, receiving that snapshot on slave_data1
and then d) doing incremental snapshots, sending, receiving as in
(a)(b)(c). How time/cpu intensive is the snapshot generation and just
how granular could this be done? I would imagine for systems with litle
traffic/changes this could be practical but what about systems that may
see a lot of files added, modified, deleted to the filesystem(s)?

I would be interested to hear anyone's experience with any (or all) of
these methods and caveats of each. I am leaning towards ggate(dc) +
zpool at the moment assuming that zfs can "smartly" rebuild the mirror
after the slave's ggated processes bug out.

Sven



signature.asc
Description: This is a digitally signed message part


Re: Multi-machine mirroring choices

2008-07-15 Thread Sven Willenberger
On Tue, 2008-07-15 at 07:54 -0700, Jeremy Chadwick wrote:
> On Tue, Jul 15, 2008 at 10:07:14AM -0400, Sven Willenberger wrote:
> > 3) The send/recv feature of zfs was something I had not even considered
> > until very recently. My understanding is that this would work by a)
> > taking a snapshot of master_data1 b) zfs sending that snapshot to
> > slave_data1 c) via ssh on pipe, receiving that snapshot on slave_data1
> > and then d) doing incremental snapshots, sending, receiving as in
> > (a)(b)(c). How time/cpu intensive is the snapshot generation and just
> > how granular could this be done? I would imagine for systems with litle
> > traffic/changes this could be practical but what about systems that may
> > see a lot of files added, modified, deleted to the filesystem(s)?
> 
> I can speak a bit on ZFS snapshots, because I've used them in the past
> with good results.
> 
> Compared to UFS2 snapshots (e.g. dump -L or mksnap_ffs), ZFS snapshots
> are fantastic.  The two main positives for me were:
> 
> 1) ZFS snapshots take significantly less time to create; I'm talking
> seconds or minutes vs. 30-45 minutes.  I also remember receiving mail
> from someone (on -hackers?  I can't remember -- let me know and I can
> dig through my mail archives for the specific mail/details) stating
> something along the lines of "over time, yes, UFS2 snapshots take
> longer and longer, it's a known design problem".
> 
> 2) ZFS snapshots, when created, do not cause the system to more or less
> deadlock until the snapshot is generated; you can continue to use the
> system during the time the snapshot is being generated.  While with
> UFS2, dump -L and mksnap_ffs will surely disappoint you.
> 
> We moved all of our production systems off of using dump/restore solely
> because of these aspects.  We didn't move to ZFS though; we went with
> rsync, which is great, except for the fact that it modifies file atimes
> (hope you use Maildir and not classic mbox/mail spools...).
> 
> ZFS's send/recv capability (over a network) is something I didn't have
> time to experiment with, but it looked *very* promising.  The method is
> documented in the manpage as "Example 12", and is very simple -- as it
> should be.  You don't have to use SSH either, by the way[1].

The examples do list ssh as the way of initiating the receiving end; I
am curious as to what the alterative would be (short of installing
openssh-portable and using cipher=no).

> One of the "annoyances" to ZFS snapshots, however, was that I had to
> write my own script to do snapshot rotations (think incremental dump(8)
> but using ZFS snapshots).

That is what I was afraid of. Using snapshots would seem to involve a
bit of housekeeping. Furthermore, it sounds more suited to a system that
needs periodic rather than constant backing up (syncing).


> > I would be interested to hear anyone's experience with any (or all) of
> > these methods and caveats of each. I am leaning towards ggate(dc) +
> > zpool at the moment assuming that zfs can "smartly" rebuild the mirror
> > after the slave's ggated processes bug out.
> 
> I don't have any experience with GEOM gate, so I can't comment on it.
> But I would highly recommend you discuss the shortcomings with pjd@,
> because he definitely listens.
> 
> However, I must ask you this: why are you doing things the way you are?
> Why are you using the equivalent of RAID 1 but for entire computers?  Is
> there some reason you aren't using a filer (e.g. NetApp) for your data,
> thus keeping it centralised?  There has been recent discussion of using
> FreeBSD with ZFS as such, over on freebsd-fs.  If you want a link to the
> thread, I can point you to it.

Basically I am trying to eliminate the "single point of failure". The
project prior to this had such a failure that even a raid5 setup could
not get out of. It was determined at that point that a single-machine
storage solution would no longer suffice. What I am trying to achieve is
having a slave machine that could take over as the file server in the
event the master machine goes down. This could be something as simple as
the master's network connection going down (CARP to the rescue on the
slave) to a complete failure of the master.

While zfs send/recv sounds like a good option for periodic backups, I
don't think it will fit my purpose. Zpool or gmirror will be a better
fit. I see in posts following my initial post that there is reference to
improvements in ggate[cd] and/or tcp since 6.2 (and I have moved to 7.0
now) so that bodes well. The question then becomes a matter of which
system would be easier to manage in terms of a) the master rebuilding
the mirr

Re: Multi-machine mirroring choices

2008-07-17 Thread Sven Willenberger
On Tue, 2008-07-15 at 16:20 +0100, Pete French wrote:
> > However, I must ask you this: why are you doing things the way you are?
> > Why are you using the equivalent of RAID 1 but for entire computers?  Is
> > there some reason you aren't using a filer (e.g. NetApp) for your data,
> > thus keeping it centralised?
> 
> I am not the roiginal poster, but I am doing something very similar and
> can answer that question for you. Some people get paranoid about the
> whole "single point of failure" thing. I originally suggestted that we buy
> a filer and have identical servers so if one breaks we connect the other
> to the filer, but the response I got was "what if the filer breaks?". So
> in the end I had to show we have duplicate independent machines, with the
> data kept symetrical on them at all times.
> 
> It does actually work quite nicely actually - I have an "'active" database
> machine, and a "passive". The opassive is only used if the active fails,
> and the drives are run as a gmirror pair with the remote one being mounted
> using ggated. It also means I can flip from active to passive when I want
> to do an OS upgrade on the active machine. Switching takes a few seconds,
> and this is fine for our setup.
> 
> So the answer is that the descisiuon was taken out of my hands - but this
> is not uncommon, and as a roll-your-own cluster it works very nicely.
> 
> -pete.
> ___

I have for now gone with using ggate[cd] along with zpool and so far
it's not bad. I can fail the master, stop ggated on the slave at which
point geom reads the glabeled disks. From there I can zpool import to an
alternate root. When the master comes back up I can zpool export and
then, on the master, zpool import at which point zfs handles the
resilvering.

The *big* issue I have right now is dealing with the slave machine going
down. Once the master no longer has a connection to the ggated devices,
all processes trying to use the device hang in D status. I have tried
pkill'ing ggatec to no avail and ggatec destroy returns a message of
gctl being busy. Trying to ggatec destroy -f panics the machine.

Does anyone know how to successfully time out a failed ggatec connection
so that I can zpool detach or somehow have zfs removed the unavailable
drive?

Sven


signature.asc
Description: This is a digitally signed message part


CARP state changes and devd.conf

2008-07-24 Thread Sven Willenberger
I see mention of CARP as a device-type in the devd.conf documentation
but for the life of me cannot manage to get devd to recognize *any*
changes in the CARP interface.

I have set
sysctl net.inet.carp.log=2
and I see message in /var/log/messages when the interface goes
INIT->BACKUP and BACKUP -> MASTER, but for the life of me cannot get
devd to "see" these changes.

I have tried something even as simple as:
notify 100 {
action "logger -p kern.notice '$device-name interface has
changed'";
};

and then bringing the CARP interfaces up and down on either boxes to
change INIT and BACKUP/MASTER states, but *nothing* is noted. Does CARP
simply not work that way with devd (i.e. only the creation of the CARP
device, not any subsequent states, work )?

Sven


signature.asc
Description: This is a digitally signed message part


Re: filesystem full error with inumber

2006-07-26 Thread Sven Willenberger


Feargal Reilly presumably uttered the following on 07/24/06 11:48:
> On Mon, 24 Jul 2006 17:14:27 +0200 (CEST)
> Oliver Fromme <[EMAIL PROTECTED]> wrote:
> 
>> Nobody else has answered so far, so I try to give it a shot ...
>>
>> The "filesystem full" error can happen in three cases:
>> 1.  The file system is running out of data space.
>> 2.  The file system is running out of inodes.
>> 3.  The file system is running out of non-fragmented blocks.
>>
>> The third case can only happen on extremely fragmented
>> file systems which happens very rarely, but maybe it's
>> a possible cause of your problem.
> 
> I rebooted that server, and df then reported that disk at 108%,
> so it appears that df was reporting incorrect figures prior to
> the reboot. Having cleaned up, it appears by my best
> calculations to be showing correct figures now.
> 
>>  > kern.maxfiles: 2
>>  > kern.openfiles: 3582
>>
>> Those have nothing to do with "filesystem full".
>>
> 
> Yeah, that's what I figured.
> 
>>  > Looking again at dumpfs, it appears to say that this is
>>  > formatted with a block size of 8K, and a fragment size of
>>  > 2K, but tuning(7) says:  [...]
>>  > Reading this makes me think that when this server was
>>  > installed, the block size was dropped from the 16K default
>>  > to 8K for performance reasons, but the fragment size was
>>  > not modified accordingly.
>>  > 
>>  > Would this be the root of my problem?
>>
>> I think a bsize/fsize ratio of 4/1 _should_ work, but it's
>> not widely used, so there might be bugs hidden somewhere.
>>
> 
> Such as df not reporting the actual data usage, which is now my
> best working theory. I don't know what df bases it's figures on,
> perhaps it either slowly got out of sync, or more likely, got
> things wrong once the disk filled up.
> 
> I'll monitor it to see if this happens again, but hopefully
> won't keep that configuration around for too much longer anyway.
> 
> Thanks,
> -fr.
> 

One of my machines that I recently upgraded to 6.1 (6.1-RELEASE-p3) is also
exhibiting df reporting wrong data usage numbers. Notice the negative "Used" 
numbers
below:

> df -h
Filesystem SizeUsed   Avail Capacity  Mounted on
/dev/da0s1a496M 63M393M14%/
devfs  1.0K1.0K  0B   100%/dev
/dev/da0s1e989M   -132M1.0G   -14%/tmp
/dev/da0s1f 15G478M 14G 3%/usr
/dev/da0s1d 15G   -1.0G 14G-8%/var
/dev/md0   496M228K456M 0%/var/spool/MIMEDefang
devfs  1.0K1.0K  0B   100%/var/named/dev

Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: filesystem full error with inumber

2006-07-26 Thread Sven Willenberger


Peter Jeremy presumably uttered the following on 07/26/06 15:00:
> On Wed, 2006-Jul-26 13:07:19 -0400, Sven Willenberger wrote:
>> One of my machines that I recently upgraded to 6.1 (6.1-RELEASE-p3) is also
>> exhibiting df reporting wrong data usage numbers.
> 
> What did you upgrade from?
> Is this UFS1 or UFS2?
> Does a full fsck fix the problem?
> 

This was an upgrade from a 5.x system (UFS2); a full fsck did in fact fix the
problem (for now).

Thanks,

Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Megacli fails to find SAS adapter

2006-10-10 Thread Sven Willenberger
FreeBSD 6.2-PRERELEASE #3: Tue Oct 10 13:58:29 EDT 2006
LSi 8480e SAS Raid card

mount:
linprocfs on /compat/linux/proc (linprocfs, local)
linsysfs on /compat/linux/sys (linsysfs, local)
/dev/mfid0s1d on /usr/local/pgsql (ufs, local, noatime)

dmesg:
mfi0: 2025 - PCI 0x041000 0x04411 0x041000 0x041002: Firmware initialization 
started (PCI ID 0411/1000/1002/1000)
mfi0: 2026 - Type 18: Firmware version 1.00.00-0074
mfi0: 2027 - Battery temperature is normal
mfi0: 2028 - Battery Present
mfi0: 2029 - PD 39(e1/s255) event: Enclosure (SES) discovered on PD 27(e1/s255)
mfi0: 2030 - PD 56(e2/s255) event: Enclosure (SES) discovered on PD 38(e2/s255)
mfi0: 2031 - PD 39(e1/s255) event: Inserted: PD 27(e1/s255)
mfi0: 2032 - Type 29: Inserted: PD 27(e1/s255) Info: enclPd=27, scsiType=d, 
portMap=10, sasAddr=50015b2180001839,
mfi0: 2033 - PD 56(e2/s255) event: Inserted: PD 38(e2/s255)

pkg_info:
linux_base-fc-4_9

I have downloaded the Megacli and, using rpm2cpio extracted
MegaCli-1.01.09-0.i386.rpm into my home directory.

~/usr/sbin/MegaCli
brandelf -t Linux usr/sbin/MegaCli

cd usr/sbin

# ./MegaCli -EncInfo -aALL

ERROR:Could not detect controller.
# ./MegaCli -CfgDsply -aALL

ERROR:Could not detect controller.

Do I actually need to set up the links in /compat/linux/sys for the SAS
raid card? or should this rpm be installed into the /compat/linux
directory? I need to upgrade the firmware on this card as for some
reason the webbios will not let me configure a Raid10 array and the only
way I can see to upgrade the fw is to use the megacli utility.

Thanks,

Sven


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Megacli fails to find SAS adapter

2006-10-11 Thread Sven Willenberger
On Tue, 2006-10-10 at 22:11 -0700, Doug Ambrisko wrote:
> Sven Willenberger writes:
> | FreeBSD 6.2-PRERELEASE #3: Tue Oct 10 13:58:29 EDT 2006
> | LSi 8480e SAS Raid card
> | 
> | mount:
> | linprocfs on /compat/linux/proc (linprocfs, local)
> | linsysfs on /compat/linux/sys (linsysfs, local)
> | /dev/mfid0s1d on /usr/local/pgsql (ufs, local, noatime)
> | 
> | dmesg:
> | mfi0: 2025 - PCI 0x041000 0x04411 0x041000 0x041002: Firmware 
> initialization started (PCI ID 0411/1000/1002/1000)
> | mfi0: 2026 - Type 18: Firmware version 1.00.00-0074
> | mfi0: 2027 - Battery temperature is normal
> | mfi0: 2028 - Battery Present
> | mfi0: 2029 - PD 39(e1/s255) event: Enclosure (SES) discovered on PD 
> 27(e1/s255)
> | mfi0: 2030 - PD 56(e2/s255) event: Enclosure (SES) discovered on PD 
> 38(e2/s255)
> | mfi0: 2031 - PD 39(e1/s255) event: Inserted: PD 27(e1/s255)
> | mfi0: 2032 - Type 29: Inserted: PD 27(e1/s255) Info: enclPd=27, scsiType=d, 
> portMap=10, sasAddr=50015b2180001839,
> | mfi0: 2033 - PD 56(e2/s255) event: Inserted: PD 38(e2/s255)
> | 
> | pkg_info:
> | linux_base-fc-4_9
> | 
> | I have downloaded the Megacli and, using rpm2cpio extracted
> | MegaCli-1.01.09-0.i386.rpm into my home directory.
> | 
> | ~/usr/sbin/MegaCli
> | brandelf -t Linux usr/sbin/MegaCli
> | 
> | cd usr/sbin
> | 
> | # ./MegaCli -EncInfo -aALL
> | 
> | ERROR:Could not detect controller.
> | # ./MegaCli -CfgDsply -aALL
> | 
> | ERROR:Could not detect controller.
> | 
> | Do I actually need to set up the links in /compat/linux/sys for the SAS
> | raid card? or should this rpm be installed into the /compat/linux
> | directory? I need to upgrade the firmware on this card as for some
> | reason the webbios will not let me configure a Raid10 array and the only
> | way I can see to upgrade the fw is to use the megacli utility.
> 
> Make sure you have the Linux ioctl module loaded before linsysfs so it
> can register the hooks.  kldstat/kernel config will help.  One sanity
> check is to do:
>   dhcp194:ambrisko 11] cat /compat/linux/sys/class/scsi_host/host*/proc_name
>   megaraid_sas
>   (null)
>   dhcp194:ambrisko 12] 
> 
> If you don't see megaraid_sas then it isn't going to work and is
> missing the linux mfi module.  Also
> you need to set:
>   sysctl compat.linux.osrelease=2.6.12
> or things won't work well.  This will probably break your fc-4_9 Linux 
> install until the updates to Linux emulation is merged (maybe it
> has but I don't think so).  Since it is a static binary we don't have 
> linux base installed.
> 
> Doug A.
> ___

Adding mfi_linux_enable="YES" to /boot/loader.conf did do the trick of
having the device added to the system:

# cat /compat/linux/sys/class/scsi_host/host*/proc_name
(null)
megaraid_sas
(null)

# sysctl compat.linux
compat.linux.oss_version: 198144
compat.linux.osrelease: 2.6.12
compat.linux.osname: Linux

Although the MegaCli utility no longer complains about not finding a
controller, it sadly does nothing else either (except dump core on
certain commands):

# ./MegaCli -AdpAllinfo -a0

# ./MegaCli -AdpGetProp SpinupDriveCount -a0

Segmentation fault (core dumped)
# ./MegaCli -LDGetNum -a0

Failed to get VD count on adapter -9993.
# ./MegaCli -CfgFreeSpaceinfo -a0


Failed to initialize RM


and so on ... I am guessing this is an issue with the MegaCli software
now; needless to say I certainly doubt that this will allow me to flash
the card bios (or even it if *could*, I would be leery of the process).

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [HEADS UP] perl symlinks in /usr/bin will be gone

2005-01-30 Thread Sven Willenberger

Anton Berezin wrote:
In order to keep pkg-install simple, no old symlink chasing and removal
will be done, although the detailed instructions will be posted in
ports/UPDATING and in pkg-message for the ports.
How about leaving it up to the installer? Much like the minicom port 
prompts the user if they would like to symlink a /dev/modem device, why 
not ask (post-install) "Would you like to make a symlink in /usr/bin to 
your new installation?" or as someone else has suggested add a make flag 
 (make ADD_SYMLINK=yes).

Those who wish to have an unpolluted /usr/bin can not opt for a symlink, 
those that want compatibility with a majority of the scripts already 
written can have the link created.

Just a thought,
Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: need ISO-image for a new machine install

2005-03-17 Thread Sven Willenberger


On Thu, 2005-03-17 at 20:13 -0500, Mikhail Teterin wrote:
> Hello!
> 
> Is there a place, from where I can download a reasonably fresh 5.4-PRERELEASE 
> install (or mini-install) .iso image for amd64?
> 
> Thanks!
> 
>   -mi
> ___
> [EMAIL PROTECTED] mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-amd64
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"

I saw a post here a little while ago that pointed to :
ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/Feb_2005/5.3-STABLE-SNAP001-amd64-miniinst.iso
 
I used this on a dual opteron system with 8GB of RAM with no problem
(i.e. the >4G RAM issue was resolved on this snapshot). Upgrading to 5.4
PRE is straightforward from there.

Sven


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Creating a striped set of mirrors using gvinum

2005-03-19 Thread Sven Willenberger
I am hoping someone has found a way to create this type of raid set 
using [g]vinum. I see that it is a trivial matter to create a mirror of 
2 striped sets but I have not seen a way to create a stripe set out of 
multiple mirrored sets (e.g. stripe across 3 sets of mirrors). Has 
anyone managed to implement this and, if so, what does your 
configuration file look like? If not, could this be added as a feature 
request for gvinum?

Sven Willenberger
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: kern/79035: gvinum unable to create a striped set of mirrored sets/plexes

2005-04-08 Thread Sven Willenberger
On Sun, 2005-03-20 at 15:51 +1030, Greg 'groggy' Lehey wrote:
> On Saturday, 19 March 2005 at 23:43:00 -0500, Sven Willenberger wrote:
> > Greg 'groggy' Lehey presumably uttered the following on 03/19/05 22:11:
> >> On Sunday, 20 March 2005 at  2:04:34 +, Sven Willenberger wrote:
> >>
> >>> Under the current implementation of gvinum it is possible to create
> >>> a mirrored set of striped plexes but not a striped set of mirrored
> >>> plexes. For purposes of resiliency the latter configuration is
> >>> preferred as illustrated by the following example:
> >>>
> >>> Use 6 disks to create one of 2 different scenarios.
> >>>
> >>> 1) Using the current abilities of gvinum create 2 striped sets using
> >>> 3 disks each: A1 A2 A3 and B1 B2 B3 then create a mirror of those 2
> >>> sets such that A(123) mirrors B(123). In this situation if any drive
> >>> in Set A fails, one still has a working set with Set B. If any drive
> >>> now fails in Set B, the system is shot.
> >>
> >> No, this is not correct.  The plex ("set") only fails when all drives
> >> in it fail.
> >
> > I hope the following diagrams better illustrate what I was trying to
> > point out. Data striped across all the A's and that is mirrored to the B
> > Stripes:
> >
> > ...
> >
> > If A1 fails, then the A Stripe set cannot function (much like in Raid 0,
> > one disk fails the set) meaning that B now is the array:
> 
> No, this is not correct.
> 
> >>> Thus the striping of mirrors (rather than a mirror of striped sets)
> >>> is a more resilient and fault-tolerant setup of a multi-disk array.
> >>
> >> No, you're misunderstanding the current implementation.
> >
> > Perhaps I am ... but unless gvinum somehow reconstructs a 3 disk stripe
> > into a 2 disk stripe in the event one disk fails, I am now sure how.
> 
> Well, you have the source code.  It's not quite the way you look at
> it.  It doesn't have stripes: it has plexes.  And they can be
> incomplete.  If a read to a plex hits a "hole", it automatically
> retries via (possibly all) the other plexes.  Only when all plexes
> have a hole in the same place does the transfer fail.
> 
> You might like to (re)read http://www.vinumvm.org/vinum/intro.html.
> 

I was really hoping that the "holes in the plex" functioning was going
to work but my tests have shown otherwise. I created a gvinum array
consisting of (A striped B) mirror (C striped D) which is the only such
mirror/stripe combination allowed by gvinum for four drives. We have:

_
| A   B |__
|___|  |
   |Mirror
_  |
| C   D |--|
|___|

Based on what the "plex hole" theory states, Drive A and Drive D could
both fail and the system would read through the holes and pick up data
from B and C (or the converse if B and C failed), functionally
equivalent to a stripe of mirrors. To fail a drive I rebooted
single-user, dd dev/zero to the beginning of the disk and then fdisk.

drive d device /dev/da4s1h
drive c device /dev/da3s1h
drive b device /dev/da2s1h
drive a device /dev/da1s1h
volume home
plex name home.p1 org striped 960s vol home
plex name home.p0 org striped 960s vol home
sd name home.p1.s1 drive d len 71681280s driveoffset 265s plex home.p1
plexoffset 960s
sd name home.p1.s0 drive c len 71681280s driveoffset 265s plex home.p1
plexoffset 0s
sd name home.p0.s1 drive b len 71681280s driveoffset 265s plex home.p0
plexoffset 960s
sd name home.p0.s0 drive a len 71681280s driveoffset 265s plex home.p0
plexoffset 0s

In my case:Fail B Fail B and C
A = /dev/da1s1h  up  up
B = /dev/da2s1h  downdown
C = /dev/da3s1h  up  down
D = /dev/da4s1h  up  up

1 Volume
V home2  up  down (!)

2 Plexes
P home.p0 (A and B)  downdown
P home.p1 (C and D)  up  down

4 Subdisks
S home.p0.s0 (A) up  up
S home.p0.s1 (B) downdown
S home.p1.s0 (C) up  down
S home p1.s1 (D) up  up

Based on this failing the one drive did in fact fail the plex (home.p0).
Although at that point I realized that failing either drive on the other
plex would also fail that plex and also the volume, I went ahead and
failed drive C also. The result was a failed volume.

With the failed B drive, once I bsdlabeled the disk to include the vinum
slice, then I got the message that the the plex was now stale (instead
of down). A simple gvinum start home2 changed the state to degraded the
the system rebuilt the array. When both drives failed I had to work a
bit of a kludge in. I gvinum setstate -f up home.p1.s0, the

Re: 5.3-S (Mar 6) softdep stack backtrace from getdirtybuf()... problem?

2005-04-10 Thread Sven Willenberger

Brandon S. Allbery KF8NH presumably uttered the following on 04/10/05 15:16:
I have twice so far had the kernel syslog a stack backtrace with no
other information.  Inspection of the kernel source, to the best of my
limited understanding, suggests that getdirtybuf() was handed a buffer
without an associated vnode.  Kernel config file and make.conf attached.
Should I be concerned?
Note that this system is an older 600MHz Athlon with only 256MB RAM, and
both times this triggered it was thrashing quite a bit (that's more or
less its usual state...).
KDB: stack backtrace:
kdb_backtrace(c06fbf78,2,c63ca26c,0,22) at kdb_backtrace+0x2e
getdirtybuf(d3196bac,0,1,c63ca26c,1) at getdirtybuf+0x2b
flush_deplist(c1a8544c,1,d3196bd4,d3196bd8,0) at flush_deplist+0x49
flush_inodedep_deps(c11eb800,5858f,c1ea723c,d3196c34,c052952f) at 
flush_inodedep_deps+0x9e
softdep_sync_metadata(d3196ca4,c1ea7210,50,c06c9a19,0) at 
softdep_sync_metadata+0x9d
ffs_fsync(d3196ca4,0,0,0,0) at ffs_fsync+0x487
fsync(c1b367d0,d3196d14,4,c10f9700,0) at fsync+0x196
syscall(2f,2f,2f,8327600,5e) at syscall+0x300
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (95, FreeBSD ELF32, fsync), eip = 0x29152d6f, esp = 0xbf5a8d5c, ebp 
= 0xbf5a8d78 ---
FreeBSD rushlight.kf8nh.com 5.3-STABLE FreeBSD 5.3-STABLE #0: Sun Mar  6 
02:56:16 EST 2005 [EMAIL PROTECTED]:/usr/src/sys/i386/compile/RUSHLIGHT  
i386

I used to see this on a regular basis on several machines I had running 
early 5 through 5.2 releases and it seemed to have gone away (for me) 
with the 5.3 release(s). I never did hear of a definitive resolution for 
this issue; your backtrace is alarmingly similar to the one that I had seen.

http://lists.freebsd.org/pipermail/freebsd-current/2004-July/031576.html
Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: SuperMicro X5DP8-G2MB/(2)XEON 2.4/1GB RAM 5.4-S Freeze

2005-04-11 Thread Sven Willenberger

Aaron Summers presumably uttered the following on 04/11/05 22:12:
Greetings,
We have a SuperMicro X5DP8-G2 Motherboard, 2xXEON 2.4, 1GB RAM server
running 5.4-STABLE that keeps freezing up.  We have replaced RAM, HD,
SCSI controller, etc.  To no avail.  We are running SMP GENERIC
Kernel.  I cannot get the system to panic, leave a core dump, etc.  It
just always freezes.  The server functions as a web server in a
HSphere Cluster.  I am about out of options besides loading 4.11
(since our 4 series servers never die).  Any help, feedback, clues,
similar experiences, etc would be greatly appreciated.
On SCSI:  The onboard Adaptec 7902 gives a dump on bootup but appears
to work.  I read the archived post about this issue.  The system still
locked up with an Adaptec 7982B that did not give this message.
DMESG:

da2 at ahd0 bus 0 target 4 lun 0
da2:  Fixed Direct Access SCSI-3 device 
da2: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged
Queueing Enabled
da2: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C)
da0 at ahd0 bus 0 target 0 lun 0
da0:  Fixed Direct Access SCSI-3 device 
da0: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged
Queueing Enabled
da0: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C)
da1 at ahd0 bus 0 target 2 lun 0
da1:  Fixed Direct Access SCSI-3 device 
da1: 320.000MB/s transfers (160.000MHz, offset 63, 16bit), Tagged
Queueing Enabled
da1: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C)
Mounting root from ufs:/dev/da0s1a

We had many issues with Seagate drives and and S/M boards with the 
onboard Adaptec scsi controllers. Seagate offered no help other than to 
suggest putting in a network card (in lieu of the onboard) and/or 
disabling SMP; neither solution was acceptable so we switched to 
IBM/Hitachi drives and the problems disappeared. By the way, the 
problems manifested themselves in those servers where we had more than 
just one hard drive installed. This was even after updating to the 
latest firmware etc; Seagate insists no problem with their drives 
although other drives work perfectly well. YMMV

I do see you say you tried other harddrives .. which ones did you use?
Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: scsi card recommendation

2005-04-13 Thread Sven Willenberger
On Wed, 2005-04-13 at 09:58 +0700, Dikshie wrote:
> dear all,
> I would like to buy SCSI card which must: 
> - support Ultra 320
> - support RAID 0,1,5, and 1/0
> any recommendation for FreeBSD-5.x ?
> 
> 
> 
> 
> thanks !
> 
> 
> -dikshie-  
> ___

We find the LSI MegaRaid 320-2x series works great (using it on a dual
opteron system), especially with the battery-backed cache ... can be
picked up for just under $1k

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in nfsd on 6.2-RC1

2006-12-15 Thread Sven Willenberger
On Tue, 2006-12-05 at 12:38 +0900, Hiroki Sato wrote:
> Kostik Belousov <[EMAIL PROTECTED]> wrote
>   in <[EMAIL PROTECTED]>:
> 
> ko> What version of sys/nfsserver/nfs_serv.c do you use ? If it is older than
> ko> 1.156.2.7, please, update the system.
> 
>  Thanks, I updated it just now and see how it works.
> 
> --
> | Hiroki SATO

I was/am having the same issue. Updating world (6.2-stable) to include
the above update sadly did not fix the problem for me. This is an amd64
box with only one client connecting to it via nfs. Reading further it
may seem to be an issue with rpc.statd and/or rpc.lockd. As I only have
one client connecting and it is being used as mail storage (i.e. the
client pops/imaps the storage) would be safe to not using fcntl forwards
over the wire? Is this same issue present in 6.1-RELENG? I am really at
my wits end at this point and for the first time am actually considering
moving to another OS (solaris more than likely) as I cannot have these
types of issues interrupting services every couple days.

What other information (spefically) can I provide to help the devs
figure out what is going on? What can I do in the meantime to have some
semblence of stability? I assume downgrading to 5.5-RELENG is out of the
question but perhaps disabling SMP?

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: panic in nfsd on 6.2-RC1

2006-12-15 Thread Sven Willenberger
On Fri, 2006-12-15 at 13:15 -0500, Kris Kennaway wrote:
> On Fri, Dec 15, 2006 at 10:01:19AM -0500, Sven Willenberger wrote:
> > On Tue, 2006-12-05 at 12:38 +0900, Hiroki Sato wrote:
> > > Kostik Belousov <[EMAIL PROTECTED]> wrote
> > >   in <[EMAIL PROTECTED]>:
> > > 
> > > ko> What version of sys/nfsserver/nfs_serv.c do you use ? If it is older 
> > > than
> > > ko> 1.156.2.7, please, update the system.
> > > 
> > >  Thanks, I updated it just now and see how it works.
> > > 
> > > --
> > > | Hiroki SATO
> > 
> > I was/am having the same issue. Updating world (6.2-stable) to include
> > the above update sadly did not fix the problem for me. This is an amd64
> > box with only one client connecting to it via nfs. Reading further it
> > may seem to be an issue with rpc.statd and/or rpc.lockd. As I only have
> > one client connecting and it is being used as mail storage (i.e. the
> > client pops/imaps the storage) would be safe to not using fcntl forwards
> > over the wire? Is this same issue present in 6.1-RELENG? I am really at
> > my wits end at this point and for the first time am actually considering
> > moving to another OS (solaris more than likely) as I cannot have these
> > types of issues interrupting services every couple days.
> > 
> > What other information (spefically) can I provide to help the devs
> > figure out what is going on? What can I do in the meantime to have some
> > semblence of stability? I assume downgrading to 5.5-RELENG is out of the
> > question but perhaps disabling SMP?
> 
> Just to confirm, can you please post the panic backtrace you are
> seeing?  And can you explain what you mean by "may seem to be an issue
> with rpc.statd and/or rpc.lockd"?
> 
> Sometimes people think they're seeing the same problem as someone else
> when really it's a completely different problem in the same subsystem,
> so I'd like to rule that out here.
> 
> Kris

Well I have now added kdb and invariants/witness support to the kernel
so I should be able to get some backtrace the next time it happens.
Currently, the system just locks and no error is displayed on the
console or /var/log/messages; sorry I cannot be of immediate help there.

Regarding the rpc issue, I just ran across mention of those in sshfs/nfs
threads appearing here and in particular to a link referenced within one
of them (http://docs.freebsd.org/cgi/getmsg.cgi?fetch=1362611+0
+archive/2006/freebsd-stable/20060702.freebsd-stable ) - it is more than
likely not at all related but I am grasping at straws here trying to
solve this.

FWIW, I do see the following appearing in the /var/log/messages:
ufs_rename: fvp == tvp (can't happen) 
about once or twice a day, but cannot correlate those to lockup. Now
that I have enabled the options mentioned above in the kernel, I am
seeing some LOR issues:

kernel: lock order reversal:
kernel: 1st 0xff00c3bab200 kqueue (kqueue) @ 
/usr/src/sys/kern/kern_event.c:1547
kernel: 2nd 0xff0005bb6078 struct mount mtx (struct mount mtx) @ 
/usr/src/sys/ufs/ufs/ufs_vnops.c:138



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Not panic in nfsd (Re: panic in nfsd on 6.2-RC1)

2006-12-18 Thread Sven Willenberger
On Fri, 2006-12-15 at 23:20 +0200, Kostik Belousov wrote:
> On Fri, Dec 15, 2006 at 02:29:58PM -0500, Kris Kennaway wrote:

<>

> >  
> > > FWIW, I do see the following appearing in the /var/log/messages:
> > > ufs_rename: fvp == tvp (can't happen) 
> > > about once or twice a day, but cannot correlate those to lockup. Now
> > > that I have enabled the options mentioned above in the kernel, I am
> > > seeing some LOR issues:
> > > 
> > > kernel: lock order reversal:
> > > kernel: 1st 0xff00c3bab200 kqueue (kqueue) @ 
> > > /usr/src/sys/kern/kern_event.c:1547
> > > kernel: 2nd 0xff0005bb6078 struct mount mtx (struct mount mtx) @ 
> > > /usr/src/sys/ufs/ufs/ufs_vnops.c:138
> > 
> > OK, this is interesting, so let's proceed from here.
> > 
> > Kris
> 
> Try this.
> 
> Index: ufs/ufs/ufs_vnops.c
> ===
> RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
> retrieving revision 1.283
> diff -u -r1.283 ufs_vnops.c
> --- ufs/ufs/ufs_vnops.c   6 Nov 2006 13:42:09 -   1.283
> +++ ufs/ufs/ufs_vnops.c   15 Dec 2006 21:19:51 -
> @@ -133,19 +133,15 @@
>  {
>   struct inode *ip;
>   struct timespec ts;
> - int mnt_locked;
>  
>   ip = VTOI(vp);
> - mnt_locked = 0;
> - if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) {
> - VI_LOCK(vp);
> + VI_LOCK(vp);
> + if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
>   goto out;
> + if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
> + VI_UNLOCK(vp);
> + return;
>   }
> - MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */
> - mnt_locked = 1;
> - VI_LOCK(vp);
> - if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
> - goto out_unl;
>  
>   if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))
>   ip->i_flag |= IN_LAZYMOD;
> @@ -172,10 +168,7 @@
>  
>   out:
>   ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
> - out_unl:
>   VI_UNLOCK(vp);
> - if (mnt_locked)
> - MNT_IUNLOCK(vp->v_mount);
>  }
>  
>  /*


Patch applied cleanly (offset 6 lines), make buildworld, make kernel,
reboot, make installworld, etc.

kernel: lock order reversal:
kernel: 1st 0xff00b9181800 kqueue (kqueue) @ 
/usr/src/sys/kern/kern_event.c:1547
kernel: 2nd 0xff00c16030d0 vnode interlock (vnode interlock) @ 
/usr/src/sys/ufs/ufs/ufs_vnops.c:132



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Migrating vinum to gvinum

2006-12-27 Thread Sven Willenberger
I have a 5.2.1 box that I want to upgrade to 5.5-RELENG and in doing so
need to upgrade/migrate the current vinum setup to gvinum. It is a
simple vinum mirror (just 2 drives with one vinum slice each). Having
done some googling on the matter I really haven't found a definitive
"best approach" to doing this. The choices would be:

1) making buildworld and making kernel. Remove the vinum-specific
entries in rc.conf and adding geom_vinum_load="YES"
to /boot/loader.conf. Rebooting and [optionally] running gvinum
saveconfig.

or 

2) clear the current vinum config (which would leave the data intact on
each part of the mirror?). Making buildworld, making kernel, adding the
loader.conf line. Then rebooting, installing world and rebuild the
gvinum device by creating a mirror with one disk and then adding the
second disk.

Anyone have experience in this migration process? Alternatively has
anyone converted a (g)vinum mirror into a gmirror setup?

Thanks

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Not panic in nfsd (Re: panic in nfsd on 6.2-RC1)

2006-12-30 Thread Sven Willenberger


Sven Willenberger presumably uttered the following on 12/18/06 12:33:
> On Fri, 2006-12-15 at 23:20 +0200, Kostik Belousov wrote:
>> On Fri, Dec 15, 2006 at 02:29:58PM -0500, Kris Kennaway wrote:
> 
> <>
> 
>>>  
>>>> FWIW, I do see the following appearing in the /var/log/messages:
>>>> ufs_rename: fvp == tvp (can't happen) 
>>>> about once or twice a day, but cannot correlate those to lockup. Now
>>>> that I have enabled the options mentioned above in the kernel, I am
>>>> seeing some LOR issues:
>>>>
>>>> kernel: lock order reversal:
>>>> kernel: 1st 0xff00c3bab200 kqueue (kqueue) @ 
>>>> /usr/src/sys/kern/kern_event.c:1547
>>>> kernel: 2nd 0xff0005bb6078 struct mount mtx (struct mount mtx) @ 
>>>> /usr/src/sys/ufs/ufs/ufs_vnops.c:138
>>> OK, this is interesting, so let's proceed from here.
>>>
>>> Kris
>> Try this.
>>
>> Index: ufs/ufs/ufs_vnops.c
>> ===
>> RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
>> retrieving revision 1.283
>> diff -u -r1.283 ufs_vnops.c
>> --- ufs/ufs/ufs_vnops.c  6 Nov 2006 13:42:09 -   1.283
>> +++ ufs/ufs/ufs_vnops.c  15 Dec 2006 21:19:51 -
>> @@ -133,19 +133,15 @@
>>  {
>>  struct inode *ip;
>>  struct timespec ts;
>> -int mnt_locked;
>>  
>>  ip = VTOI(vp);
>> -mnt_locked = 0;
>> -if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) {
>> -VI_LOCK(vp);
>> +VI_LOCK(vp);
>> +if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
>>  goto out;
>> +if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
>> +VI_UNLOCK(vp);
>> +return;
>>  }
>> -MNT_ILOCK(vp->v_mount); /* For reading of mnt_kern_flags. */
>> -mnt_locked = 1;
>> -VI_LOCK(vp);
>> -if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
>> -goto out_unl;
>>  
>>  if ((vp->v_type == VBLK || vp->v_type == VCHR) && !DOINGSOFTDEP(vp))
>>  ip->i_flag |= IN_LAZYMOD;
>> @@ -172,10 +168,7 @@
>>  
>>   out:
>>  ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
>> - out_unl:
>>  VI_UNLOCK(vp);
>> -if (mnt_locked)
>> -MNT_IUNLOCK(vp->v_mount);
>>  }
>>  
>>  /*
> 
> 
> Patch applied cleanly (offset 6 lines), make buildworld, make kernel,
> reboot, make installworld, etc.
> 
> kernel: lock order reversal:
> kernel: 1st 0xff00b9181800 kqueue (kqueue) @ 
> /usr/src/sys/kern/kern_event.c:1547
> kernel: 2nd 0xff00c16030d0 vnode interlock (vnode interlock) @ 
> /usr/src/sys/ufs/ufs/ufs_vnops.c:132
> 
> 
> 
> ___

Having enabled witness and ddb, etc I cannot get this LOR to trigger anymore, 
but
the machine is still locking up. I finally managed to get a piece of what was
appearing on the console which is the following (copied by hand by an onsite 
tech so
there may be a typo here and there):

cut--

bge_intr() at loge_intr+0x84a
ithread_loop() at ithread_loop+0x14c
fork_exit() at fork_exit+0xbb
fork_trampoline() at fork_trampoline+0xee
--- trap 0, rip-0, rsp-0xb371ad00, rbp-0 ---

Fatal trap 12: page fault while in Kernel Mode
cupid=1, apic id=01
fault virtual address - 0x28
fault code - supervisor write, page not present
instruction pointer - 0x8:0x801dae1a
stack pointer - 0x10:0xb371ab70
frame pointer - 0x10:0xb371abd0
code segment - base 0x0, limit 0xf, type 0x1b
 - DPL 0, pres 1, long 1, def32 0, gram 1

processor eflags=interrupt enabled, resume, IOPL=0
current process=28 (irq 24:bge0)
trap number=12
panic: page fault
cupid=1

Uptime - 4d10h52m36s
Dumping 4031MB (2 chunks)
chunk0: 1MB (156 pages)... ok
chunk1: 4031MB (1031920)

--cut-

For some reason, by the time it reboots, there is no dump file available (even
though it is enabled in rc.conf and there is more than enough room in 
/var/crash to
hold it).

Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Verifying serial setup for KDB_UNATTENDED

2007-01-03 Thread Sven Willenberger
We have a RELENG_6 amd64 box that has been experiencing lockups/panics
every 4 days or so. The box is at a remote location and trying to get a
full trace has been nigh impossible with the staffing constraints. As
such I would like to set up a serial console (using another FreeBSD box
and minicom or cu). I would like to verify that the following will work
to allow a) the other FreeBSD box to have a terminal session via COM1 b)
have it work regardless of whether a keyboard and/or monitor is plugged
into the target box and c) still allow terminal redirection to internal
or serial console if a keyboard is attached :

/boot/loader.conf
hint.sio.0.flags="0x30"
console="comconsole,vidconsole"
boot_multicons="YES"
boot_console="YES"
comsconsole_speed="19200"

/etc/ttys
ttyd0 "/usr/libexec/getty std.19200" vt100 on secure

As this is basically a 6.2 system, I assume I don't need to do anything
re: boot blocks or /etc/make.conf or the like? Kernel has already been
built with options DDB, options KDB_UNATTENDED, options KDB, options
KDB_TRACE.

Would the modifications to the 2 files listed above be sufficient to
meet my wishes above and allow me to see the panic to terminal when the
system does panic (and allow me to even trace, etc via the kdb
debugger) ?

Thanks,

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Panic in 6.2-PRERELEASE with bge on amd64

2007-01-07 Thread Sven Willenberger
I am starting a new thread on this as what I had assumed was a panic in
nfsd turns out to be an issue with the bge driver. This is an amd64 box,
dual processor (SMP kernel) that happens to be running nfsd. About every
3-5 days the kernel panics and I have finally managed to get a core
dump. 
The system: FreeBSD 6.2-PRERELEASE #8: Tue Jan  2 10:57:39 EST 2007

The short and dirty of the dump:

# kgdb /usr/obj/usr/src/sys/MSPOOL/kernel.debug /var/crash/vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd".

Unread portion of the kernel message buffer:
lock order reversal: (sleepable after non-sleepable)
 1st 0x8836b010 bge0 (network driver) @ 
/usr/src/sys/dev/bge/if_bge.c:2675
 2nd 0x805f26b0 user map (user map) @ /usr/src/sys/vm/vm_map.c:3074
KDB: stack backtrace:
witness_checkorder() at witness_checkorder+0x4da
_sx_xlock() at _sx_xlock+0x51
vm_map_lookup() at vm_map_lookup+0x44
vm_fault() at vm_fault+0xba
trap_pfault() at trap_pfault+0x13c
trap() at trap+0x1f9
calltrap() at calltrap+0x5
--- trap 0xc, rip = 0x801d5f17, rsp = 0xb371ab50, rbp = 
0xb371aba0 ---
bge_rxeof() at bge_rxeof+0x3b7
bge_intr() at bge_intr+0x1c8
ithread_loop() at ithread_loop+0x14c
fork_exit() at fork_exit+0xbb
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xb371ad00, rbp = 0 ---


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x28
fault code  = supervisor write, page not present
instruction pointer = 0x8:0x801d5f17
stack pointer   = 0x10:0xb371ab50
frame pointer   = 0x10:0xb371aba0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 28 (irq24: bge0)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 3d4h18m42s

#0  doadump () at pcpu.h:172
172 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:172
#1  0x802771b9 in boot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:409
#2  0x80276c4b in panic (fmt=0x8044160c "%s") at 
/usr/src/sys/kern/kern_shutdown.c:565
#3  0x803ebba6 in trap_fatal (frame=0xc, eva=18446742978291675136) at 
/usr/src/sys/amd64/amd64/trap.c:660
#4  0x803ebee3 in trap_pfault (frame=0xb371aaa0, usermode=0) at 
/usr/src/sys/amd64/amd64/trap.c:573
#5  0x803ec0f9 in trap (frame=
  {tf_rdi = 0, tf_rsi = 0, tf_rdx = 1, tf_rcx = 499, tf_r8 = 2521427970, 
tf_r9 = -1099500152320, tf_rax = 0, tf_rbx = -1263948192, tf_rbp = -1284396128, 
tf_r10 = 0, tf_r11 = 0, tf_r12 = -2009681920, tf_r13 = 0, tf_r14 = 0, tf_r15 = 
-1099499984896, tf_trapno = 12, tf_addr = 40, tf_flags = -1263948192, tf_err = 
2, tf_rip = -2145558761, tf_cs = 8, tf_rflags = 66071, tf_rsp = -1284396192, 
tf_ss = 16})
at /usr/src/sys/amd64/amd64/trap.c:352
#6  0x803d779b in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:168
#7  0x801d5f17 in bge_rxeof (sc=0x8836b000) at 
/usr/src/sys/dev/bge/if_bge.c:2528
#8  0x801db818 in bge_intr (xsc=0x0) at 
/usr/src/sys/dev/bge/if_bge.c:2707
#9  0x8025f2bc in ithread_loop (arg=0xffb1b320) at 
/usr/src/sys/kern/kern_intr.c:682
#10 0x8025e00b in fork_exit (callout=0x8025f170 , 
arg=0xffb1b320, frame=0xb371ac50)
at /usr/src/sys/kern/kern_fork.c:821
#11 0x803d7afe in fork_trampoline () at 
/usr/src/sys/amd64/amd64/exception.S:394

If more information is needed (disassemble, etc) please let me know. In
the interim I may switch to either using the base100 ethernet port (fxp)
or turn off SMP.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Panic in 6.2-PRERELEASE with bge on amd64

2007-01-08 Thread Sven Willenberger
On Mon, 2007-01-08 at 16:06 +1100, Bruce Evans wrote:
> On Sun, 7 Jan 2007, Sven Willenberger wrote:
> 
> > I am starting a new thread on this as what I had assumed was a panic in
> > nfsd turns out to be an issue with the bge driver. This is an amd64 box,
> > dual processor (SMP kernel) that happens to be running nfsd. About every
> > 3-5 days the kernel panics and I have finally managed to get a core
> > dump.
> > The system: FreeBSD 6.2-PRERELEASE #8: Tue Jan  2 10:57:39 EST 2007
> 
> Like most NIC drivers, bge unlocks and re-locks around its call to
> ether_input() in its interrupt handler.  This isn't very safe, and it
> certainly causes panics for bge.  I often see it panic when bringing
> the interface down and up while input is arriving, on a non-SMP non-amd64
> (actually i386) non-6.x (actually -current) system.  Bringing the
> interface down is probably the worst case.  It creates a null pointer
> for bge_intr() to follow.
> 
> > The short and dirty of the dump:
> > ...
> > --- trap 0xc, rip = 0x801d5f17, rsp = 0xb371ab50, rbp = 
> > 0xb371aba0 ---
> > bge_rxeof() at bge_rxeof+0x3b7
> 
> What is the instruction here?

I will do my best to ferret out the information you need. For the
bge_rxeof() at bge_rxeof+0x3b7 line, the instruction is:

0x801d5f17 : mov%r15,0x28(%r14)

bge_intr() at bge_intr+0x1c8 line, the instruction is:

0x801db818 :  mov%rbx,%rdi

> 
> > bge_intr() at bge_intr+0x1c8
> > ithread_loop() at ithread_loop+0x14c
> > fork_exit() at fork_exit+0xbb
> > fork_trampoline() at fork_trampoline+0xe
> > --- trap 0, rip = 0, rsp = 0xb371ad00, rbp = 0 ---
> 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 1; apic id = 01
> > fault virtual address   = 0x28
> 
> Looks like a null pointer panic anyway.  I guess the instruction is
> movl to/from 0x28(%reg) where %reg is a null pointer.
> 

from the above lines, apparently %r14 is null then.

> > ...
> > #8  0x801db818 in bge_intr (xsc=0x0) at 
> > /usr/src/sys/dev/bge/if_bge.c:2707
> 
> What is the statement here?  It presumably follow a null pointer and only
> the exprssion for the pointer is interesting.  xsc is already null but
> that is probably a bug in gdb, or the result of excessive optimization.
> Compiling kernels with -O2 has little effect except to break debugging.
> 

the block of code from if_bge.c:

   2705 if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
   2706 /* Check RX return ring producer/consumer. */
   2707 bge_rxeof(sc);
   2708
   2709 /* Check TX ring producer/consumer. */
   2710 bge_txeof(sc);
   2711 }

By default -O2 is passed to CC (I don't use any custom make flags other
than and only define CPUTYPE in my /etc/make.conf).

> I rarely use gdb on kernels and haven't looked closely enough using ddb
> to see where the null pointer for the panic on down/up came from.
> 
> BTW, the sbdrop panic in -current isn't bge-only or SMP-only.  I saw
> it once for sk on a non-SMP system.  It rarely happens for non-SMP
> (much more rarely than the panic in bge_intr()).  Under -current, on
> an SMP amd64 system with bge, It happens almost every time on close
> of the socket for a ttcp server if input is arriving at the time of
> the close.  I haven't seen it for 6.x.
> 
> Bruce

The short of it is that this interface sees pretty much non-stop traffic
as this is a mailserver (final destination) and is constantly being
delivered to (direct disk access) and mail being retrieved (remote
machine(s) with nfs mounted mail spools. If a momentary down of the
interface is enough to completely panic the driver and then the kernel,
this hardly seems "robust" if, in fact, this is what is happening. So
the question arises as to what would be causing the down/up of the
interface; I could start looking at the cable, the switch it's connected
to and ... any other ideas? (I don't have watchdog enabled or anything
like that, for example).

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Panic in 6.2-PRERELEASE with bge on amd64

2007-01-09 Thread Sven Willenberger
On Tue, 2007-01-09 at 12:50 +1100, Bruce Evans wrote:
> On Mon, 8 Jan 2007, Sven Willenberger wrote:
> 
> > On Mon, 2007-01-08 at 16:06 +1100, Bruce Evans wrote:
> >> On Sun, 7 Jan 2007, Sven Willenberger wrote:
> 
> >>> The short and dirty of the dump:
> >>> ...
> >>> --- trap 0xc, rip = 0x801d5f17, rsp = 0xb371ab50, rbp = 
> >>> 0xb371aba0 ---
> >>> bge_rxeof() at bge_rxeof+0x3b7
> >>
> >> What is the instruction here?
> >
> > I will do my best to ferret out the information you need. For the
> > bge_rxeof() at bge_rxeof+0x3b7 line, the instruction is:
> >
> > 0x801d5f17 : mov%r15,0x28(%r14)
> > ...
> >> Looks like a null pointer panic anyway.  I guess the instruction is
> >> movl to/from 0x28(%reg) where %reg is a null pointer.
> >>
> >
> > from the above lines, apparently %r14 is null then.
> 
> Yes.  It's a bit suprising that the access is a write.
> 
> >>> ...
> >>> #8  0x801db818 in bge_intr (xsc=0x0) at 
> >>> /usr/src/sys/dev/bge/if_bge.c:2707
> >>
> >> What is the statement here?  It presumably follow a null pointer and only
> >> the exprssion for the pointer is interesting.  xsc is already null but
> >> that is probably a bug in gdb, or the result of excessive optimization.
> >> Compiling kernels with -O2 has little effect except to break debugging.
> >
> > the block of code from if_bge.c:
> >
> >   2705 if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
> >   2706 /* Check RX return ring producer/consumer. */
> >   2707 bge_rxeof(sc);
> >   2708
> >   2709 /* Check TX ring producer/consumer. */
> >   2710 bge_txeof(sc);
> >   2711 }
> 
> Oops.  I should have asked for the statment in bge_rxeof().

#7  0x801d5f17 in bge_rxeof (sc=0x8836b000) at 
/usr/src/sys/dev/bge/if_bge.c:2528
2528m->m_pkthdr.len = m->m_len = cur_rx->bge_len - 
ETHER_CRC_LEN;

(where m is defined as:
2449 struct mbuf *m = NULL;
)


> 
> > By default -O2 is passed to CC (I don't use any custom make flags other
> > than and only define CPUTYPE in my /etc/make.conf).
> 
> -O2 is unfortunately the default for COPTFLAGS for most arches in
> sys/conf/kern.pre.mk.  All of my machines and most FreeBSD cluster
> machines override this default in /etc/make.conf.
> 
> With the override overridden for RELENG_6 amd64, gcc inlines bge_rxeof(),
> so your environment must be a little different to get even the above
> ifo.  I think gdb can show the correct line numbers but not the call
> frames (since there is no call).  ddb and the kernel stack trace can
> only show the call frames for actual calls.
> 
> With -O1, I couldn't find any instruction similar to the mov to the
> null pointer + 28.  28 is a popular offset in mbufs

If you have a suggestion for an /etc/make.conf line, I can recompile the
kernel accordingly assuming it still panics or locks up after the change
of interface noted below.

> 
> > The short of it is that this interface sees pretty much non-stop traffic
> > as this is a mailserver (final destination) and is constantly being
> > delivered to (direct disk access) and mail being retrieved (remote
> > machine(s) with nfs mounted mail spools. If a momentary down of the
> > interface is enough to completely panic the driver and then the kernel,
> > this hardly seems "robust" if, in fact, this is what is happening. So
> > the question arises as to what would be causing the down/up of the
> > interface; I could start looking at the cable, the switch it's connected
> > to and ... any other ideas? (I don't have watchdog enabled or anything
> > like that, for example).
> 
> I don't think down/up can occur in normal operation, since it takes ioctls
> or a watchdog timeout to do it.  Maybe some ioctls other than a full
> down/up can cause problems... bge_init() is called for the following
> ioctls:
> - mtu changes
> - some near down/up (possibly only these)
> Suspend/resume and of course detach/attach do much the same things as
> down/up.
> 
> BTW, I added some sysctls and found it annoying to have to do down/up
> to make the sysctls take effect.  Sysctls in several other NIC drivers
> require the same, since doing a full reinitialization is easiest.
> Since I am tuning using sysctls, I got used to doing down/up too much.
> 
> Similarly for the mtu ioctl.  I think a full reinitialization is used
> for mtu 

Re: Panic in 6.2-PRERELEASE with bge on amd64

2007-01-09 Thread Sven Willenberger
On Tue, 2007-01-09 at 11:50 -0500, John Baldwin wrote:
> On Tuesday 09 January 2007 09:37, Sven Willenberger wrote:
> > On Tue, 2007-01-09 at 12:50 +1100, Bruce Evans wrote:
> > > On Mon, 8 Jan 2007, Sven Willenberger wrote:
> > > 
> > > > On Mon, 2007-01-08 at 16:06 +1100, Bruce Evans wrote:
> > > >> On Sun, 7 Jan 2007, Sven Willenberger wrote:
> > > 
> > > >>> The short and dirty of the dump:
> > > >>> ...
> > > >>> --- trap 0xc, rip = 0x801d5f17, rsp = 0xb371ab50, rbp 
> = 0xb371aba0 ---
> > > >>> bge_rxeof() at bge_rxeof+0x3b7
> > > >>
> > > >> What is the instruction here?
> > > >
> > > > I will do my best to ferret out the information you need. For the
> > > > bge_rxeof() at bge_rxeof+0x3b7 line, the instruction is:
> > > >
> > > > 0x801d5f17 : mov%r15,0x28(%r14)
> > > > ...
> > > >> Looks like a null pointer panic anyway.  I guess the instruction is
> > > >> movl to/from 0x28(%reg) where %reg is a null pointer.
> > > >>
> > > >
> > > > from the above lines, apparently %r14 is null then.
> > > 
> > > Yes.  It's a bit suprising that the access is a write.
> > > 
> > > >>> ...
> > > >>> #8  0x801db818 in bge_intr (xsc=0x0) 
> at /usr/src/sys/dev/bge/if_bge.c:2707
> > > >>
> > > >> What is the statement here?  It presumably follow a null pointer and 
> only
> > > >> the exprssion for the pointer is interesting.  xsc is already null but
> > > >> that is probably a bug in gdb, or the result of excessive optimization.
> > > >> Compiling kernels with -O2 has little effect except to break debugging.
> > > >
> > > > the block of code from if_bge.c:
> > > >
> > > >   2705 if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
> > > >   2706 /* Check RX return ring producer/consumer. */
> > > >   2707 bge_rxeof(sc);
> > > >   2708
> > > >   2709 /* Check TX ring producer/consumer. */
> > > >   2710 bge_txeof(sc);
> > > >   2711 }
> > > 
> > > Oops.  I should have asked for the statment in bge_rxeof().
> > 
> > #7  0x801d5f17 in bge_rxeof (sc=0x8836b000) 
> at /usr/src/sys/dev/bge/if_bge.c:2528
> > 2528m->m_pkthdr.len = m->m_len = cur_rx->bge_len - 
> ETHER_CRC_LEN;
> > 
> > (where m is defined as:
> > 2449 struct mbuf *m = NULL;
> > )
> 
> It's assigned earlier in between those two places.  Can you 'p rxidx' as well 
> as 'p sc->bge_cdata.bge_rx_std_chain[rxidx]' and 'p 
> sc->bge_cdata.bge_rx_jumbo_chain[rxidx]'?  Also, are you using jumbo frames 
> at all? 
> 

(kgdb) p rxidx
$1 = 499
(kgdb) p sc->bge_cdata.bge_rx_std_chain[rxidx]
$2 = (struct mbuf *) 0xff0097a27900
(kgdb) p sc->bge_cdata.bge_rx_jumbo_chain[rxidx]
$3 = (struct mbuf *) 0x0

And no, I am not using jumbo frames:
bge0: flags=8843 mtu 1500
options=1b

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Panic in 6.2-PRERELEASE with bge on amd64

2007-01-09 Thread Sven Willenberger
On Tue, 2007-01-09 at 14:09 -0500, John Baldwin wrote:
> On Tuesday 09 January 2007 12:53, Sven Willenberger wrote:
> > On Tue, 2007-01-09 at 11:50 -0500, John Baldwin wrote:
> > > On Tuesday 09 January 2007 09:37, Sven Willenberger wrote:
> > > > On Tue, 2007-01-09 at 12:50 +1100, Bruce Evans wrote:
> > > > > On Mon, 8 Jan 2007, Sven Willenberger wrote:
> > > > > 
> > > > > > On Mon, 2007-01-08 at 16:06 +1100, Bruce Evans wrote:
> > > > > >> On Sun, 7 Jan 2007, Sven Willenberger wrote:
> > > > > 
> > > > > >>> The short and dirty of the dump:
> > > > > >>> ...
> > > > > >>> --- trap 0xc, rip = 0x801d5f17, rsp = 0xb371ab50, 
> rbp 
> > > = 0xb371aba0 ---
> > > > > >>> bge_rxeof() at bge_rxeof+0x3b7
> > > > > >>
> > > > > >> What is the instruction here?
> > > > > >
> > > > > > I will do my best to ferret out the information you need. For the
> > > > > > bge_rxeof() at bge_rxeof+0x3b7 line, the instruction is:
> > > > > >
> > > > > > 0x801d5f17 : mov%r15,0x28(%r14)
> > > > > > ...
> > > > > >> Looks like a null pointer panic anyway.  I guess the instruction is
> > > > > >> movl to/from 0x28(%reg) where %reg is a null pointer.
> > > > > >>
> > > > > >
> > > > > > from the above lines, apparently %r14 is null then.
> > > > > 
> > > > > Yes.  It's a bit suprising that the access is a write.
> > > > > 
> > > > > >>> ...
> > > > > >>> #8  0x801db818 in bge_intr (xsc=0x0) 
> > > at /usr/src/sys/dev/bge/if_bge.c:2707
> > > > > >>
> > > > > >> What is the statement here?  It presumably follow a null pointer 
> and 
> > > only
> > > > > >> the exprssion for the pointer is interesting.  xsc is already null 
> but
> > > > > >> that is probably a bug in gdb, or the result of excessive 
> optimization.
> > > > > >> Compiling kernels with -O2 has little effect except to break 
> debugging.
> > > > > >
> > > > > > the block of code from if_bge.c:
> > > > > >
> > > > > >   2705 if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
> > > > > >   2706 /* Check RX return ring producer/consumer. */
> > > > > >   2707 bge_rxeof(sc);
> > > > > >   2708
> > > > > >   2709 /* Check TX ring producer/consumer. */
> > > > > >   2710 bge_txeof(sc);
> > > > > >   2711 }
> > > > > 
> > > > > Oops.  I should have asked for the statment in bge_rxeof().
> > > > 
> > > > #7  0x801d5f17 in bge_rxeof (sc=0x8836b000) 
> > > at /usr/src/sys/dev/bge/if_bge.c:2528
> > > > 2528m->m_pkthdr.len = m->m_len = cur_rx->bge_len - 
> > > ETHER_CRC_LEN;
> > > > 
> > > > (where m is defined as:
> > > > 2449 struct mbuf *m = NULL;
> > > > )
> > > 
> > > It's assigned earlier in between those two places.  Can you 'p rxidx' as 
> well 
> > > as 'p sc->bge_cdata.bge_rx_std_chain[rxidx]' and 'p 
> > > sc->bge_cdata.bge_rx_jumbo_chain[rxidx]'?  Also, are you using jumbo 
> frames 
> > > at all? 
> > > 
> > 
> > (kgdb) p rxidx
> > $1 = 499
> > (kgdb) p sc->bge_cdata.bge_rx_std_chain[rxidx]
> > $2 = (struct mbuf *) 0xff0097a27900
> > (kgdb) p sc->bge_cdata.bge_rx_jumbo_chain[rxidx]
> > $3 = (struct mbuf *) 0x0
> > 
> > And no, I am not using jumbo frames:
> > bge0: flags=8843 mtu 1500
> > options=1b
> 
> Did you do a 'p m' to verify that m is NULL?  If you can reproduce this, I'd 
> add some KASSERT's where it fetches the mbuf out of the descriptor data to 
> see if m is NULL.
> 
at this spot, m is null:
(kgdb) p m
$3 = (struct mbuf *) 0x0

As far as adding some KASSERT's ... you have gone beyond my rudimentary
knowledge here as far as application goes.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Panic in 6.2-PRERELEASE with bge on amd64

2007-01-10 Thread Sven Willenberger


Bruce Evans presumably uttered the following on 01/09/07 21:42:
> On Tue, 9 Jan 2007, John Baldwin wrote:
> 
>> On Tuesday 09 January 2007 09:37, Sven Willenberger wrote:
>>> On Tue, 2007-01-09 at 12:50 +1100, Bruce Evans wrote:
>>>> Oops.  I should have asked for the statment in bge_rxeof().
>>>
>>> #7  0x801d5f17 in bge_rxeof (sc=0x8836b000)
>> at /usr/src/sys/dev/bge/if_bge.c:2528
>>> 2528m->m_pkthdr.len = m->m_len = cur_rx->bge_len -
>> ETHER_CRC_LEN;
>>>
>>> (where m is defined as:
>>> 2449 struct mbuf *m = NULL;
>>> )
>>
>> It's assigned earlier in between those two places.
> 
> Its initialization here is just a style bug.
> 
>> Can you 'p rxidx' as well
>> as 'p sc->bge_cdata.bge_rx_std_chain[rxidx]' and 'p
>> sc->bge_cdata.bge_rx_jumbo_chain[rxidx]'?  Also, are you using jumbo
>> frames
>> at all?
> 
> Also look at nearby chain entries (especially at (rxidx - 1) mod 512)).
> I think the previous 255 entries and the rxidx one should be
> non-NULL since we should have refilled them as we used them (so the
> one at rxidx is least interesting since we certainly just refilled
> it), and the next 256 entries should be NULL since we bogusly only use
> half of the entries.  If the problem is uninitialization, then I expect
> all 512 entries except the one just refilled at rxidx to be NULL.
> 
> Bruce
> ___

(kgdb) p sc->bge_cdata.bge_rx_std_chain[rxidx]
$1 = (struct mbuf *) 0xff0097a27900
(kgdb) p rxidx
$2 = 499

since rxidx = 499, I assume you are most interested in 498:
(kgdb) p sc->bge_cdata.bge_rx_std_chain[498]
$3 = (struct mbuf *) 0xff00cf1b3100

for the sake of argument, 500 is null:
(kgdb) p sc->bge_cdata.bge_rx_std_chain[500]
$13 = (struct mbuf *) 0x0

the indexes with values basically are 243 through 499:
(kgdb) p sc->bge_cdata.bge_rx_std_chain[241]
$30 = (struct mbuf *) 0x0
(kgdb) p sc->bge_cdata.bge_rx_std_chain[242]
$31 = (struct mbuf *) 0x0
(kgdb) p sc->bge_cdata.bge_rx_std_chain[243]
$32 = (struct mbuf *) 0xff005d4ab700
(kgdb) p sc->bge_cdata.bge_rx_std_chain[244]
$33 = (struct mbuf *) 0xff004f644b00

so it does not seem to be a problem with "uninitialization".
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Not panic in nfsd (Re: panic in nfsd on 6.2-RC1)

2007-01-15 Thread Sven Willenberger
On Sat, 2007-01-13 at 15:11 -0500, Kris Kennaway wrote:
> On Sat, Dec 30, 2006 at 06:04:13PM -0500, Sven Willenberger wrote:
> > 
> > 
> > Sven Willenberger presumably uttered the following on 12/18/06 12:33:
> > > On Fri, 2006-12-15 at 23:20 +0200, Kostik Belousov wrote:
> > >> On Fri, Dec 15, 2006 at 02:29:58PM -0500, Kris Kennaway wrote:
> > > 
> > > <>
> > > 
> > >>>  
> > >>>> FWIW, I do see the following appearing in the /var/log/messages:
> > >>>> ufs_rename: fvp == tvp (can't happen) 
> > >>>> about once or twice a day, but cannot correlate those to lockup. Now
> > >>>> that I have enabled the options mentioned above in the kernel, I am
> > >>>> seeing some LOR issues:
> > >>>>
> > >>>> kernel: lock order reversal:
> > >>>> kernel: 1st 0xff00c3bab200 kqueue (kqueue) @ 
> > >>>> /usr/src/sys/kern/kern_event.c:1547
> > >>>> kernel: 2nd 0xff0005bb6078 struct mount mtx (struct mount mtx) @ 
> > >>>> /usr/src/sys/ufs/ufs/ufs_vnops.c:138
> > >>> OK, this is interesting, so let's proceed from here.
> > >>>
> > >>> Kris
> > >> Try this.
> > >>
> > >> Index: ufs/ufs/ufs_vnops.c
> > >> ===
> > >> RCS file: /usr/local/arch/ncvs/src/sys/ufs/ufs/ufs_vnops.c,v
> > >> retrieving revision 1.283
> > >> diff -u -r1.283 ufs_vnops.c
> > >> --- ufs/ufs/ufs_vnops.c  6 Nov 2006 13:42:09 -   1.283
> > >> +++ ufs/ufs/ufs_vnops.c  15 Dec 2006 21:19:51 -
> > >> @@ -133,19 +133,15 @@
> > >>  {
> > >>  struct inode *ip;
> > >>  struct timespec ts;
> > >> -int mnt_locked;
> > >>  
> > >>  ip = VTOI(vp);
> > >> -mnt_locked = 0;
> > >> -if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0) {
> > >> -VI_LOCK(vp);
> > >> +VI_LOCK(vp);
> > >> +if ((vp->v_mount->mnt_flag & MNT_RDONLY) != 0)
> > >>  goto out;
> > >> +if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0) {
> > >> +VI_UNLOCK(vp);
> > >> +return;
> > >>  }
> > >> -MNT_ILOCK(vp->v_mount); /* For reading of 
> > >> mnt_kern_flags. */
> > >> -mnt_locked = 1;
> > >> -VI_LOCK(vp);
> > >> -if ((ip->i_flag & (IN_ACCESS | IN_CHANGE | IN_UPDATE)) == 0)
> > >> -goto out_unl;
> > >>  
> > >>  if ((vp->v_type == VBLK || vp->v_type == VCHR) && 
> > >> !DOINGSOFTDEP(vp))
> > >>  ip->i_flag |= IN_LAZYMOD;
> > >> @@ -172,10 +168,7 @@
> > >>  
> > >>   out:
> > >>  ip->i_flag &= ~(IN_ACCESS | IN_CHANGE | IN_UPDATE);
> > >> - out_unl:
> > >>  VI_UNLOCK(vp);
> > >> -if (mnt_locked)
> > >> -MNT_IUNLOCK(vp->v_mount);
> > >>  }
> > >>  
> > >>  /*
> > > 
> > > 
> > > Patch applied cleanly (offset 6 lines), make buildworld, make kernel,
> > > reboot, make installworld, etc.
> > > 
> > > kernel: lock order reversal:
> > > kernel: 1st 0xff00b9181800 kqueue (kqueue) @ 
> > > /usr/src/sys/kern/kern_event.c:1547
> > > kernel: 2nd 0xff00c16030d0 vnode interlock (vnode interlock) @ 
> > > /usr/src/sys/ufs/ufs/ufs_vnops.c:132
> > > 
> > > 
> > > 
> > > ___
> > 
> > Having enabled witness and ddb, etc I cannot get this LOR to trigger 
> > anymore, but
> > the machine is still locking up. I finally managed to get a piece of what 
> > was
> > appearing on the console which is the following (copied by hand by an 
> > onsite tech so
> > there may be a typo here and there):
> > 
> > cut--
> > 
> > bge_intr() at loge_intr+0x84a
> > ithread_loop() at ithread_loop+0x14c
> > fork_exit() at fork_exit+0xbb
> > fork_trampoline() at fork_trampoline+0xee
> > --- trap 0, rip-0, rsp-0xb371ad00, rbp-0 ---
> > 
> > 

Re: bge panic (Re: Not panic in nfsd (Re: panic in nfsd on 6.2-RC1))

2007-01-15 Thread Sven Willenberger
On Mon, 2007-01-15 at 13:22 -0500, Kris Kennaway wrote:
> On Mon, Jan 15, 2007 at 11:33:33AM -0500, Sven Willenberger wrote:
> 
> > > This is indicating a problem either with your bge hardware or the driver.
> > > 
> > > Kris
> > 
> > I suspect the driver: This same hardware setup was being used as a
> > databse server with FreeBSD 5.4. I had been using the bge driver but set
> > at base100T without any issue at all. It was when I did a clean install
> > of 6.2-Prerelease and setting bge to use the full gigE speed (via
> > autonegotiate) that these issues cropped up.
> 
> Be careful before you start blaming FreeBSD - since you did not test
> the failing hardware configuration in the older version of FreeBSD you
> cannot yet determine that it is a driver regression.
> 
> Kris

I will freely admit that this may be circumstantial, that the hardware
failed at the same time I upgraded to the newer version of FreeBSD. It
could also be that there is an issue with the bge driver being used with
1000 (gigE) speeds instead of at fastE speeds as I used it with the 5.4
release (same hardware). Unfortunately, now that the fxp connection
seems stable (for the moment) I am going to take advantage of the uptime
and will have to leave troubleshooting/debugging/etc to what I have
provided in the other responses I have sent.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: bge panic (Re: Not panic in nfsd (Re: panic in nfsd on 6.2-RC1))

2007-01-15 Thread Sven Willenberger
On Mon, 2007-01-15 at 15:24 -0500, Kris Kennaway wrote:
> On Mon, Jan 15, 2007 at 02:03:28PM -0500, Sven Willenberger wrote:
> > On Mon, 2007-01-15 at 13:22 -0500, Kris Kennaway wrote:
> > > On Mon, Jan 15, 2007 at 11:33:33AM -0500, Sven Willenberger wrote:
> > > 
> > > > > This is indicating a problem either with your bge hardware or the 
> > > > > driver.
> > > > > 
> > > > > Kris
> > > > 
> > > > I suspect the driver: This same hardware setup was being used as a
> > > > databse server with FreeBSD 5.4. I had been using the bge driver but set
> > > > at base100T without any issue at all. It was when I did a clean install
> > > > of 6.2-Prerelease and setting bge to use the full gigE speed (via
> > > > autonegotiate) that these issues cropped up.
> > > 
> > > Be careful before you start blaming FreeBSD - since you did not test
> > > the failing hardware configuration in the older version of FreeBSD you
> > > cannot yet determine that it is a driver regression.
> > > 
> > > Kris
> > 
> > I will freely admit that this may be circumstantial, that the hardware
> > failed at the same time I upgraded to the newer version of FreeBSD. It
> > could also be that there is an issue with the bge driver being used with
> > 1000 (gigE) speeds instead of at fastE speeds as I used it with the 5.4
> > release (same hardware).
> 
> The latter is what I am referring to.  Your hardware may never have
> worked in gige mode due to your hardware being broken (yes, this
> happens), or it could be a freebsd driver issue either introduced in
> 6.x or present in 5.4 too.  You just haven't ruled these cases out.
> 
> Anyway, since you're happy with your present workaround we'll have to
> just drop the issue for now.
> 
> Kris

As the box in question is now fully production I can no longer "guinea
pig" it. However, I will attempt to set up a test bed with similar
hardware and try and push as much traffic through the bge interface at
gigE speeds as I can in an effort to duplicate this issue. If it does
crop up, this box should allow me to more effectively provide debugging
information as it will not be a production unit. Although the current
workaround is satisfactory for now (and to that extent I am "happy") I
would much rather have the available headroom of full gigE traffic to
this server so I would like to see if I can reproduce the error or at
least find out if it is a hardware issue (if nothing else than for my
own edification).

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


ggate + gmirror write performance woes

2007-04-05 Thread Sven Willenberger
I am trying to set up a HA type system involving two identical boxes and
have gone through the following to set up the systems:

Slave server: 
ggated -R 196608 -S 196608
(exporting /dev/amrd1 )
net.inet.tcp.sendspace: 65536
net.inet.tcp.recvspace: 131072



Master server:
ggatec create -u 0 -R 196608 -S 196608 -o rw [slaveip] /dev/amrd1
net.inet.tcp.sendspace: 131072
net.inet.tcp.recvspace: 65536


#gmirror status
  NameStatus  Components
mirror/gm0  COMPLETE  amrd1s1
  ggate0s1

the two servers are connected to each other via their 2ndary physical
gigE interfaces using cat6 crossover cable. (Netperf shows 890 Mbps at
95% confidence).

softupdates are enable on gm0 (though this does not affect the results).

The results:
/usr/bin/time -h cp testfile64M /data1
28.62s real 0.00s user  0.16s sys

and this is very consistent ... about 3 MB/s over repeated runs 

dd if=/dev/zero of=/data1/testfile32M2 bs=32k count=1024
1024+0 records in
1024+0 records out
33554432 bytes transferred in 16.122641 secs (2081199 bytes/sec)

What else can I tune here to make this functional? If I increase
recvspace and sendspace much beyond those numbers, ggated will not start
claiming to not have enough buffer space.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ggate + gmirror write performance woes

2007-04-05 Thread Sven Willenberger
On Thu, 2007-04-05 at 17:38 +0100, Tom Judge wrote:
> Dmitriy Kirhlarov wrote:
> > On Thu, Apr 05, 2007 at 10:58:56AM -0400, Sven Willenberger wrote:
> >> I am trying to set up a HA type system involving two identical boxes and
> >> have gone through the following to set up the systems:
> >>
> >> Slave server: 
> >> ggated -R 196608 -S 196608
> >> (exporting /dev/amrd1 )
> >> net.inet.tcp.sendspace: 65536
> >> net.inet.tcp.recvspace: 131072
> > 
> > Try
> > net.local.stream.recvspace=65535
> > net.local.stream.sendspace=65535
> > 
> > Also, try increase this sysctls with
> > net.inet.tcp.rfc1323=1
> > 
> > I use it on FreeBSD 5.x with:
> > net.inet.tcp.sendspace=131072
> > net.inet.tcp.recvspace=131072
> > net.local.stream.recvspace=65535
> > net.local.stream.sendspace=65535
> > 
> > ggated -R 1048576 -S 1048576
> > ggatec -R 1048576 -S 1048576
> > 
> > WBR.
> > Dmitriy
> > ___
> > freebsd-stable@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> 
> 
> I have seen sustained writes of 30Mb/s using the following configuration:
> 
> cat /boot/loader.conf
> kern.ipc.nmbclusters="32768"
> 
> cat /etc/sysctl.conf
> net.inet.tcp.sendspace=1048576
> net.inet.tcp.recvspace=1048576
> 
> Server:
> /sbin/ggated -S 1310720 -R 1310720 -a 172.31.0.18 /etc/gg.exports
> 
> Client:
> /sbin/ggatec create -q 2048 -t 5 -S 1310720 -R 1310720 172.31.0.18 
> /dev/amrd0s2
> 
> The raid array is a RAID 1 volume on a dell PERC4 (Dell PE1850) with 
> adaptive read ahead and write back caching.
> 
> Tom

I have tried both the settings ideas suggested above but I cannot even
get out of the gate with those. Setting net.inet.tcp.{send,recv}space to
anything higher that 131072 results in ggated bailing with the error:
# ggated -v -a 10.10.0.19
info: Reading exports file (/etc/gg.exports).
debug: Added 10.10.0.0/24 /dev/amrd1 RW to exports list.
debug: Added 10.10.0.0/24 /dev/amrd3 RW to exports list.
info: Exporting 2 object(s).
error: Cannot open stream socket: No buffer space available.
error: Exiting.

setting net.inet.tcp.{send,recv}space to 131072 allows me to start
ggated with the default R and S values of 131072; anything higher
results in "no buffer space" errors. At 131072 ggated starts but then I
cannot even open a new connection (like ssh) to the server as the ssh
client bails with "no buffer space available".

more information:
# netstat -m
514/641/1155 mbufs in use (current/cache/total)
512/284/796/32768 mbuf clusters in use (current/cache/total/max)
512/256 mbuf+clusters out of packet secondary zone in use
(current/cache)
0/0/0/0 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/0 9k jumbo clusters in use (current/cache/total/max)
0/0/0/0 16k jumbo clusters in use (current/cache/total/max)
1152K/728K/1880K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/4/6656 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

This is on a FreeBSD 6.2-RELENG box i386 SMP using the amr driver (SATA
Raid using LSiMegaRaid.

The odd thing is that even after I set the send and recvspace down to
values like 65536, I continue to get the no buffer error when trying to
connect to it remotely again.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ggate + gmirror write performance woes

2007-04-09 Thread Sven Willenberger
On Fri, 2007-04-06 at 16:18 +0300, Nikolay Pavlov wrote:
> On Thursday,  5 April 2007 at 16:15:35 -0400, Sven Willenberger wrote:
> > On Thu, 2007-04-05 at 17:38 +0100, Tom Judge wrote:
> > > Dmitriy Kirhlarov wrote:
> > > > On Thu, Apr 05, 2007 at 10:58:56AM -0400, Sven Willenberger wrote:
> > > >> I am trying to set up a HA type system involving two identical boxes 
> > > >> and
> > > >> have gone through the following to set up the systems:
> > > >>
> > > >> Slave server: 
> > > >> ggated -R 196608 -S 196608
> > > >> (exporting /dev/amrd1 )
> > > >> net.inet.tcp.sendspace: 65536
> > > >> net.inet.tcp.recvspace: 131072
> > > > 
> > > > Try
> > > > net.local.stream.recvspace=65535
> > > > net.local.stream.sendspace=65535
> > > > 
> > > > Also, try increase this sysctls with
> > > > net.inet.tcp.rfc1323=1
> > > > 
> > > > I use it on FreeBSD 5.x with:
> > > > net.inet.tcp.sendspace=131072
> > > > net.inet.tcp.recvspace=131072
> > > > net.local.stream.recvspace=65535
> > > > net.local.stream.sendspace=65535
> > > > 
> > > > ggated -R 1048576 -S 1048576
> > > > ggatec -R 1048576 -S 1048576
> > > > 
> > > > WBR.
> > > > Dmitriy
> > > > ___
> > > > freebsd-stable@freebsd.org mailing list
> > > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> > > > To unsubscribe, send any mail to "[EMAIL PROTECTED]"
> > > 
> > > 
> > > I have seen sustained writes of 30Mb/s using the following configuration:
> > > 
> > > cat /boot/loader.conf
> > > kern.ipc.nmbclusters="32768"
> > > 
> > > cat /etc/sysctl.conf
> > > net.inet.tcp.sendspace=1048576
> > > net.inet.tcp.recvspace=1048576
> > > 
> > > Server:
> > > /sbin/ggated -S 1310720 -R 1310720 -a 172.31.0.18 /etc/gg.exports
> > > 
> > > Client:
> > > /sbin/ggatec create -q 2048 -t 5 -S 1310720 -R 1310720 172.31.0.18 
> > > /dev/amrd0s2
> > > 
> > > The raid array is a RAID 1 volume on a dell PERC4 (Dell PE1850) with 
> > > adaptive read ahead and write back caching.
> > > 
> > > Tom
> > 
> > I have tried both the settings ideas suggested above but I cannot even
> > get out of the gate with those. Setting net.inet.tcp.{send,recv}space to
> > anything higher that 131072 results in ggated bailing with the error:
> > # ggated -v -a 10.10.0.19
> > info: Reading exports file (/etc/gg.exports).
> > debug: Added 10.10.0.0/24 /dev/amrd1 RW to exports list.
> > debug: Added 10.10.0.0/24 /dev/amrd3 RW to exports list.
> > info: Exporting 2 object(s).
> > error: Cannot open stream socket: No buffer space available.
> > error: Exiting.
> 
> For values of net.inet.tcp.{send,recv}space more than
> 524288 you also need to adjust kern.ipc.maxsockbuf
> 
> Try this configuration for example:
> kern.ipc.maxsockbuf=2049152
> net.inet.tcp.recvspace=1024576
> net.inet.tcp.sendspace=1024576
> 

kern.ipc.maxsockbuf was the issue here; I increased its value and now I
no longer get the buffer space error. Furthermore, the write speed issue
was also tied to a hardware raid controller issue. After fixing that
issue, and with just the following:

kern.ipc.maxsockbuf=1048576
net.inet.tcp.sendspace=131072
net.inet.tcp.recvspace=131072

I can start ggated with -R 262144 -S 262144 as well as the ggatec and
see write speeds of 60+MB/s. I may play around with the settings more
(and see if any further speed improvements occur), but this is quite
acceptable at this point. (For the record nmbclusters is set to 32768).

The next part of the project will be writing the freevrrp failover
scripts to deal with I/O locking issues that will happen if the ggated
server fails, etc.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


CARP and em0 timeout watchdog

2007-04-18 Thread Sven Willenberger
I currently have a FreeBSD 6.2-RELEASE-p3 SMP with dual intel PRO/1000PM
nics configured as follows:

em0: flags=8943 mtu 1500
options=b
inet 192.168.0.18 netmask 0xff00 broadcast 192.168.0.255
ether 00:30:48:8d:5c:0a
media: Ethernet autoselect (1000baseTX )
status: active
em1: flags=8843 mtu 4096
options=b
inet 10.10.0.18 netmask 0xfff8 broadcast 10.10.0.23
ether 00:30:48:8d:5c:0b
media: Ethernet autoselect (1000baseTX )
status: active

the em0 interface connects to the LAN while the em1 interface is
connected to an identical box via CAT6 crossover cable (for
ggate/gmirror).

Now, I have also configured a carp interface:

carp0: flags=49 mtu 1500
inet 192.168.0.20 netmask 0x
carp: MASTER vhid 1 advbase 1 advskew 0

There are twin boxes here and I am running Samba. The problem is that
with transfers across the carp IP (192.168.0.20) I end up with em0
resetting after a watchdog timeout error. This occurs whether I transfer
files from a windows box using a share (samba) or via ftp. This problem
does *not* occur if I ftp to the 192.168.0.19 interface (non-virtual). I
suspected cabling at first so had all the cabling in question replaced
with fresh CAT6 to no avail. Several gigs of data can be transferred to
the real interface (em0) without any issue at all; a max of maybe 1 - 2
Gig can be transferred connected to the carp'ed IP before the em0 reset.
Any ideas here?

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: CARP and em0 timeout watchdog

2007-04-20 Thread Sven Willenberger
On Wed, 2007-04-18 at 11:50 -0400, Sven Willenberger wrote:
> I currently have a FreeBSD 6.2-RELEASE-p3 SMP with dual intel PRO/1000PM
> nics configured as follows:
> 
> em0: flags=8943 mtu 1500
> options=b
> inet 192.168.0.18 netmask 0xff00 broadcast 192.168.0.255
> ether 00:30:48:8d:5c:0a
> media: Ethernet autoselect (1000baseTX )
> status: active
> em1: flags=8843 mtu 4096
> options=b
> inet 10.10.0.18 netmask 0xfff8 broadcast 10.10.0.23
> ether 00:30:48:8d:5c:0b
> media: Ethernet autoselect (1000baseTX )
> status: active
> 
> the em0 interface connects to the LAN while the em1 interface is
> connected to an identical box via CAT6 crossover cable (for
> ggate/gmirror).
> 
> Now, I have also configured a carp interface:
> 
> carp0: flags=49 mtu 1500
> inet 192.168.0.20 netmask 0x
> carp: MASTER vhid 1 advbase 1 advskew 0
> 
> There are twin boxes here and I am running Samba. The problem is that
> with transfers across the carp IP (192.168.0.20) I end up with em0
> resetting after a watchdog timeout error. This occurs whether I transfer
> files from a windows box using a share (samba) or via ftp. This problem
> does *not* occur if I ftp to the 192.168.0.19 interface (non-virtual). I
> suspected cabling at first so had all the cabling in question replaced
> with fresh CAT6 to no avail. Several gigs of data can be transferred to
> the real interface (em0) without any issue at all; a max of maybe 1 - 2
> Gig can be transferred connected to the carp'ed IP before the em0 reset.
> Any ideas here?
> 
> Sven
> 

Having done more diagnostics I have found out it is not CARP related at
all. It turns out that the same timeouts will happen when ftp'ing to the
physical address IPs as well. There is also an odd situation here
depending on which protocol I use. The two boxes are connected to a Dell
Powerconnect 2616 gig switch with CAT6. If I scp files from the
192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a
hiccup (I used dd to create various sized testfiles from 32M to 1G in
size and just scp testfile* to the other box). On the other hand, if I
connect to 192.168.0.19 using ftp (either active or passive) where ftp
is being run through inetd, the interface resets (watchdog) within
seconds (a few MBs) of traffic. Enabling polling does nothing, nor does
changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing
such behavioral differences between scp and ftp?

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: CARP and em0 timeout watchdog

2007-04-20 Thread Sven Willenberger
On Fri, 2007-04-20 at 09:04 -0700, Jeremy Chadwick wrote:
> On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote:
> > Having done more diagnostics I have found out it is not CARP related at
> > all. It turns out that the same timeouts will happen when ftp'ing to the
> > physical address IPs as well. There is also an odd situation here
> > depending on which protocol I use. The two boxes are connected to a Dell
> > Powerconnect 2616 gig switch with CAT6. If I scp files from the
> > 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a
> > hiccup (I used dd to create various sized testfiles from 32M to 1G in
> > size and just scp testfile* to the other box). On the other hand, if I
> > connect to 192.168.0.19 using ftp (either active or passive) where ftp
> > is being run through inetd, the interface resets (watchdog) within
> > seconds (a few MBs) of traffic. Enabling polling does nothing, nor does
> > changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing
> > such behavioral differences between scp and ftp?
> 
> You'll get a much higher throughput rate with FTP than you will with
> SSH, simply because encryption overhead is quite high (even with the
> Blowfish cipher).  With a very fast processor and on a gigE network
> you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP.
> That's the only difference I can think of.
> 
> The watchdog resets I can't explain; Jack Vogel should be able to assist
> with that.  But it sounds like the resets only happen under very high
> throughput conditions (which is why you'd see it with FTP but not SSH).
> 

I guess it is possible that the traffic from ftp (or smb) is overloading
the interface; fwiw, if I increase the {recv,send}space to 131072 I can
acheive 32MB+/s using scp (and ftp shows similar values). The real
question is how to avoid these watchdog timeouts during heavy traffic;
the whole point here was to replace windows-based fileshare servers with
FreeBSD for the local network but at the moment it is proving
ineffectual as any samba file transfers stall (much like ftp). I see no
other error messages in the logfiles other than the watchdog timeouts
plus interface down/up messages.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: CARP and em0 timeout watchdog

2007-04-20 Thread Sven Willenberger
On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote:
> On 4/20/07, Jeremy Chadwick <[EMAIL PROTECTED]> wrote:
> > On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote:
> > > Having done more diagnostics I have found out it is not CARP related at
> > > all. It turns out that the same timeouts will happen when ftp'ing to the
> > > physical address IPs as well. There is also an odd situation here
> > > depending on which protocol I use. The two boxes are connected to a Dell
> > > Powerconnect 2616 gig switch with CAT6. If I scp files from the
> > > 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without a
> > > hiccup (I used dd to create various sized testfiles from 32M to 1G in
> > > size and just scp testfile* to the other box). On the other hand, if I
> > > connect to 192.168.0.19 using ftp (either active or passive) where ftp
> > > is being run through inetd, the interface resets (watchdog) within
> > > seconds (a few MBs) of traffic. Enabling polling does nothing, nor does
> > > changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing
> > > such behavioral differences between scp and ftp?
> >
> > You'll get a much higher throughput rate with FTP than you will with
> > SSH, simply because encryption overhead is quite high (even with the
> > Blowfish cipher).  With a very fast processor and on a gigE network
> > you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP.
> > That's the only difference I can think of.
> >
> > The watchdog resets I can't explain; Jack Vogel should be able to assist
> > with that.  But it sounds like the resets only happen under very high
> > throughput conditions (which is why you'd see it with FTP but not SSH).
> 
> What kind of hardware is this interface? Watchdogs mean TX cleanup
> isn't happening in a reasonable time, without further data its hard to
> know what might be going on.
> 
> Jack

from pciconf:

[EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03
hdr=0x00
vendor   = 'Intel Corporation'
device   = 'PRO/1000 PM'
class= network
subclass = ethernet
[EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00
hdr=0x00
vendor   = 'Intel Corporation'
class= network
subclass = ethernet

em0 is the interface in question.

from dmesg:

em0:  port
0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13

em1:  port
0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: CARP and em0 timeout watchdog

2007-04-20 Thread Sven Willenberger
On Fri, 2007-04-20 at 18:46 +0200, Clayton Milos wrote:
> - Original Message - 
> From: "Sven Willenberger" <[EMAIL PROTECTED]>
> To: "Jeremy Chadwick" <[EMAIL PROTECTED]>
> Cc: 
> Sent: Friday, April 20, 2007 6:25 PM
> Subject: Re: CARP and em0 timeout watchdog
> 
> 
> > On Fri, 2007-04-20 at 09:04 -0700, Jeremy Chadwick wrote:
> >> On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote:
> >> > Having done more diagnostics I have found out it is not CARP related at
> >> > all. It turns out that the same timeouts will happen when ftp'ing to 
> >> > the
> >> > physical address IPs as well. There is also an odd situation here
> >> > depending on which protocol I use. The two boxes are connected to a 
> >> > Dell
> >> > Powerconnect 2616 gig switch with CAT6. If I scp files from the
> >> > 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth without 
> >> > a
> >> > hiccup (I used dd to create various sized testfiles from 32M to 1G in
> >> > size and just scp testfile* to the other box). On the other hand, if I
> >> > connect to 192.168.0.19 using ftp (either active or passive) where ftp
> >> > is being run through inetd, the interface resets (watchdog) within
> >> > seconds (a few MBs) of traffic. Enabling polling does nothing, nor does
> >> > changing net.inet.tcp.{recv,send}space. Any ideas why I would be seeing
> >> > such behavioral differences between scp and ftp?
> >>
> >> You'll get a much higher throughput rate with FTP than you will with
> >> SSH, simply because encryption overhead is quite high (even with the
> >> Blowfish cipher).  With a very fast processor and on a gigE network
> >> you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP.
> >> That's the only difference I can think of.
> >>
> >> The watchdog resets I can't explain; Jack Vogel should be able to assist
> >> with that.  But it sounds like the resets only happen under very high
> >> throughput conditions (which is why you'd see it with FTP but not SSH).
> >>
> >
> > I guess it is possible that the traffic from ftp (or smb) is overloading
> > the interface; fwiw, if I increase the {recv,send}space to 131072 I can
> > acheive 32MB+/s using scp (and ftp shows similar values). The real
> > question is how to avoid these watchdog timeouts during heavy traffic;
> > the whole point here was to replace windows-based fileshare servers with
> > FreeBSD for the local network but at the moment it is proving
> > ineffectual as any samba file transfers stall (much like ftp). I see no
> > other error messages in the logfiles other than the watchdog timeouts
> > plus interface down/up messages.
> >
> > Sven
> >
> 
> Sorry for jumping on a thread here. I've had issues with em NIC's as well. 
> Especially with heavy loads. What helped for me was turning on polling. I 
> recompiled the kernel with polling and turned it on in rc.conf and my 
> problems disappeared.
> 
> Are you running with polling on?
> 

At first I did not have polling compiled in, so no. Then I compiled in
polling (and used options HZ=2000) but it didn't change anything.
Whether I have polling enabled or disabled on the interface, the outcome
is the same.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: CARP and em0 timeout watchdog

2007-04-20 Thread Sven Willenberger
On Fri, 2007-04-20 at 11:27 -0700, Jack Vogel wrote:
> On 4/20/07, Sven Willenberger <[EMAIL PROTECTED]> wrote:
> > On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote:
> > > On 4/20/07, Jeremy Chadwick <[EMAIL PROTECTED]> wrote:
> > > > On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote:
> > > > > Having done more diagnostics I have found out it is not CARP related 
> > > > > at
> > > > > all. It turns out that the same timeouts will happen when ftp'ing to 
> > > > > the
> > > > > physical address IPs as well. There is also an odd situation here
> > > > > depending on which protocol I use. The two boxes are connected to a 
> > > > > Dell
> > > > > Powerconnect 2616 gig switch with CAT6. If I scp files from the
> > > > > 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth 
> > > > > without a
> > > > > hiccup (I used dd to create various sized testfiles from 32M to 1G in
> > > > > size and just scp testfile* to the other box). On the other hand, if I
> > > > > connect to 192.168.0.19 using ftp (either active or passive) where ftp
> > > > > is being run through inetd, the interface resets (watchdog) within
> > > > > seconds (a few MBs) of traffic. Enabling polling does nothing, nor 
> > > > > does
> > > > > changing net.inet.tcp.{recv,send}space. Any ideas why I would be 
> > > > > seeing
> > > > > such behavioral differences between scp and ftp?
> > > >
> > > > You'll get a much higher throughput rate with FTP than you will with
> > > > SSH, simply because encryption overhead is quite high (even with the
> > > > Blowfish cipher).  With a very fast processor and on a gigE network
> > > > you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP.
> > > > That's the only difference I can think of.
> > > >
> > > > The watchdog resets I can't explain; Jack Vogel should be able to assist
> > > > with that.  But it sounds like the resets only happen under very high
> > > > throughput conditions (which is why you'd see it with FTP but not SSH).
> > >
> > > What kind of hardware is this interface? Watchdogs mean TX cleanup
> > > isn't happening in a reasonable time, without further data its hard to
> > > know what might be going on.
> > >
> > > Jack
> >
> > from pciconf:
> >
> > [EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 
> > rev=0x03
> > hdr=0x00
> > vendor   = 'Intel Corporation'
> > device   = 'PRO/1000 PM'
> > class= network
> > subclass = ethernet
> > [EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 
> > rev=0x00
> > hdr=0x00
> > vendor   = 'Intel Corporation'
> > class= network
> > subclass = ethernet
> >
> > em0 is the interface in question.
> >
> > from dmesg:
> >
> > em0:  port
> > 0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13
> >
> > em1:  port
> > 0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14
> 
> OH, this is an 82573, and I've posted a firmware patcher a couple
> different times, there is a bit in the MANC register that is incorrectly
> programmed in some vendors systems. Can you search email for
> that patcher, it needs to run from DOS. If you are unable to find
> it let me know and I'll resent you a copy.
> 
> Jack

If you are referring to the dcgdis.ThisIsZip attachment, I found it in
earlier threads, thanks. Will work on patching the nics and will keep
the list updated.

Thanks again.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: CARP and em0 timeout watchdog

2007-04-27 Thread Sven Willenberger
On Fri, 2007-04-20 at 14:44 -0400, Sven Willenberger wrote:
> On Fri, 2007-04-20 at 11:27 -0700, Jack Vogel wrote:
> > On 4/20/07, Sven Willenberger <[EMAIL PROTECTED]> wrote:
> > > On Fri, 2007-04-20 at 10:17 -0700, Jack Vogel wrote:
> > > > On 4/20/07, Jeremy Chadwick <[EMAIL PROTECTED]> wrote:
> > > > > On Fri, Apr 20, 2007 at 11:51:56AM -0400, Sven Willenberger wrote:
> > > > > > Having done more diagnostics I have found out it is not CARP 
> > > > > > related at
> > > > > > all. It turns out that the same timeouts will happen when ftp'ing 
> > > > > > to the
> > > > > > physical address IPs as well. There is also an odd situation here
> > > > > > depending on which protocol I use. The two boxes are connected to a 
> > > > > > Dell
> > > > > > Powerconnect 2616 gig switch with CAT6. If I scp files from the
> > > > > > 192.168.0.18 to the 192.168.0.19 box I can transfer gigs worth 
> > > > > > without a
> > > > > > hiccup (I used dd to create various sized testfiles from 32M to 1G 
> > > > > > in
> > > > > > size and just scp testfile* to the other box). On the other hand, 
> > > > > > if I
> > > > > > connect to 192.168.0.19 using ftp (either active or passive) where 
> > > > > > ftp
> > > > > > is being run through inetd, the interface resets (watchdog) within
> > > > > > seconds (a few MBs) of traffic. Enabling polling does nothing, nor 
> > > > > > does
> > > > > > changing net.inet.tcp.{recv,send}space. Any ideas why I would be 
> > > > > > seeing
> > > > > > such behavioral differences between scp and ftp?
> > > > >
> > > > > You'll get a much higher throughput rate with FTP than you will with
> > > > > SSH, simply because encryption overhead is quite high (even with the
> > > > > Blowfish cipher).  With a very fast processor and on a gigE network
> > > > > you'll probably see 8-9MByte/sec via SSH while 60-70MByte/sec via FTP.
> > > > > That's the only difference I can think of.
> > > > >
> > > > > The watchdog resets I can't explain; Jack Vogel should be able to 
> > > > > assist
> > > > > with that.  But it sounds like the resets only happen under very high
> > > > > throughput conditions (which is why you'd see it with FTP but not 
> > > > > SSH).
> > > >
> > > > What kind of hardware is this interface? Watchdogs mean TX cleanup
> > > > isn't happening in a reasonable time, without further data its hard to
> > > > know what might be going on.
> > > >
> > > > Jack
> > >
> > > from pciconf:
> > >
> > > [EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 
> > > rev=0x03
> > > hdr=0x00
> > > vendor   = 'Intel Corporation'
> > > device   = 'PRO/1000 PM'
> > > class= network
> > > subclass = ethernet
> > > [EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 
> > > rev=0x00
> > > hdr=0x00
> > > vendor   = 'Intel Corporation'
> > > class= network
> > > subclass = ethernet
> > >
> > > em0 is the interface in question.
> > >
> > > from dmesg:
> > >
> > > em0:  port
> > > 0x4000-0x401f mem 0xe030-0xe031 irq 16 at device 0.0 on pci13
> > >
> > > em1:  port
> > > 0x5000-0x501f mem 0xe040-0xe041 irq 17 at device 0.0 on pci14
> > 
> > OH, this is an 82573, and I've posted a firmware patcher a couple
> > different times, there is a bit in the MANC register that is incorrectly
> > programmed in some vendors systems. Can you search email for
> > that patcher, it needs to run from DOS. If you are unable to find
> > it let me know and I'll resent you a copy.
> > 
> > Jack
> 
> If you are referring to the dcgdis.ThisIsZip attachment, I found it in
> earlier threads, thanks. Will work on patching the nics and will keep
> the list updated.
> 
> Thanks again.
> 
> Sven
> 
I am happy to report that the firmware patch seems to have fixed the
issue and I can transfer data across the gigE network without the
watchdog timeouts and lockups. Thanks again!!

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Another em0 watchdog timeout

2007-05-03 Thread Sven Willenberger
On Tue, 2007-05-01 at 11:05 -0700, Michael Collette wrote:
> I realize there is a previous thread discussing this, but my symptoms
> seem to be a little bit different.  Here's the stats...
> 
> FreeBSD 6.2-STABLE #1: Fri Apr 27 17:28:22 PDT 2007
> 
> [EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 
> rev=0x03 hdr=0x00
> vendor = 'Intel Corporation'
> device = 'PRO/1000 PM'
> class  = network
> subclass   = ethernet
> [EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 
> rev=0x00 hdr=0x00
> vendor = 'Intel Corporation'
> class  = network
> subclass   = ethernet
> 
> em0:  port
> 0x5000-0x501f mem 0xea30-0xea31 irq 16 at device 0.0 on pci13
> em0: Ethernet address: 00:30:48:5c:cc:84
> em1:  port
> 0x6000-0x601f mem 0xea40-0xea41 irq 17 at device 0.0 on pci14
> em1: Ethernet address: 00:30:48:5c:cc:85
> 
> I'm seeing the following entries in my messages log pop up about 2-4
> times a day...
> 
> May 1 08:29:38 alpha kernel: em0: watchdog timeout -- resetting
> May 1 08:29:38 alpha kernel: em0: link state changed to DOWN
> May 1 08:29:41 alpha kernel: em0: link state changed to UP
> 
> I've gone and added the DEVICE_POLLING option in the kernel, but this
> doesn't seem to help.  The problem only seems to happen during the
> hours that my users would be hitting this box, so it really gets
> noticed when those 3 seconds go by.  And yes, it's almost always a 3
> second drop on the interface.
> 
> Is there anything I can do to prevent this from happening?  I saw
> mention of a firmware update I might try, but haven't been able to
> locate the file in question.
> 
> Thanks,

Search the list for a post by Jack Vogel that contains an attachment
named "dcgdis.ThisIsZip". That firmware patch solved my em watchdog
timeout issues.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


LSI Megaraid (amr) performance woes

2006-02-23 Thread Sven Willenberger
I am having some issues getting any (write) performance out of an LSi
Megaraid (320-1) SCSI raid card (using the amr driver). The system is an
i386 (p4 xeon) with on-board adaptec scsi controllers and a SUPER GEM318
Saf-te backplane with 6 ea 146GB U320 10k rpm Hitachi drives.

dmesg highlights at message end.

The main problem I am having is getting anywhere near a decent write
performance using the card. I compared having the backplane connected to
the on-board adaptec controller to having it connected to the LSi
controller.

I tried 3 methods of benchmarking. "Adaptec Connected" involved using
the on-board adaptec scsi controller, "LSi Connected" involved using the
LSI controller as a simple controller having each drive its own logical
raid0 drive. LSI write-through and write-back involved using the LSi
controller to set up 2 single raid0 drives as their own logical unit and
a "spanned" mirror of 4 drives (raid10) as a logical unit (write-back
and write-through simply referring to the write method used).

In the case of the "XXX Connected" setup, I created a raid10
configuration with 4 of the drives as follows (shown is the commands for
adaptec .. for lsi I simply used amrd2 amrd3 etc for the drives).

gmirror label -b load md1 da2 da3
gmirror label -b load md2 da4 da5
gmirror load
gstripe label -s 65536 md0 /dev/mirror/md1 /dev/mirror/md2
newfs /dev/stripe/md0
mkdir /bench
mount /dev/stripe/md0 /bench

to test read and write performance I used dd as follows:

dd if=/dev/zero of=/raid_or_single_drive/bench64 bs=64k count=32768
which created 2GB files.

The summary of results (measured in bytes/sec) is as follows:

| SINGLE DRIVE| RAID DRIVE   |
Connection Method   |  Write   |   Read   |  Write   |   Read|
|-|--|
adaptec connected   | 58808057 | 78188838 | 78625494 | 127331944 |
lsi singles | 43944507 | 81238863 | 95104511 | 111626492 |
lsi write-through   | 45716204 | 81748996 |*10299554*| 108620637 |
lsi write-back  | 31689131 | 37241934 | 50382152 |  56053085 |

With the drives connected to the adaptec controller and using geom, I
get the expected increase in write and read performance when moving from
a single drive to a raid10 system. Likewise, when using the LSI
controller to manage the drives as single units and using geom to create
the raid, I get a marked increase in write performance (less of a read
increase). 

However, when using the LSI to create the raid, I end up with a
*miserable* 10MB/sec write speed (while achieving acceptable read
speeds) in write-through mode and mediocre write speeds in write-back
mode (which, without a battery-backed raid card I would rather not do)
and, for some reason, a marked decrease in read speeds (over the
write-through values).

So the question arises as to whether this is an issue with the way the
LSI card (320-1) handles "spans" (which I call stripes - versus mirrors)
or the way the amr driver views such spans, or an issue with the card
not playing nicely with the supermicro motherboard, or perhaps even a
defective card. Has anyone else had experience with this card and
motherboard combination?

As a side note, I also tried dragonfly-bsd (1.4.0) which also uses the
amr driver and experienced similar results, and linux (slackware 10.2
default install) which showed write speeds of 45MB/s or so and read
speeds of 140MB/s or so using the default LSI controller settings
(write-through, 64k stripe size, etc.)

Any help or ideas here would be really appreciated in an effort to get
anywhere near acceptable write speeds without relying on the unsafe
write-back method or excessively sacrificing read speeds.


dmesg highlights:
FreeBSD 6.0-RELEASE #0: Thu Nov  3 09:36:13 UTC 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2799.22-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9

Features=0xbfebfbff
  Features2=0x4400>
  Hyperthreading: 2 logical CPUs
real memory  = 1073217536 (1023 MB)
avail memory = 1041264640 (993 MB)

pcib5:  at device 29.0 on pci4
pci5:  on pcib5
amr0:  mem 0xfe20-0xfe20 irq 96 at
device 1.0 on pci5
amr0:  Firmware 1L37, BIOS G119, 64MB RAM
pci4:  at device 30.0 (no driver
attached)
pcib6:  at device 31.0 on pci4
pci6:  on pcib6
ahd0:  port
0x4400-0x44ff,0x4000-0x40ff mem 0xfc40-0xfc401fff irq 76 at device
2.0 on pci6
ahd0: [GIANT-LOCKED]
aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs
ahd1:  port
0x4c00-0x4cff,0x4800-0x48ff mem 0xfc402000-0xfc403fff irq 77 at device
2.1 on pci6
ahd1: [GIANT-LOCKED]
aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 101-133Mhz, 512 SCBs


Thanks,

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/fr

Re: LSI Megaraid (amr) performance woes

2006-03-01 Thread Sven Willenberger
On Thu, 2006-02-23 at 15:53 -0500, Kris Kennaway wrote: 
> On Thu, Feb 23, 2006 at 03:41:06PM -0500, Sven Willenberger wrote:
> > I am having some issues getting any (write) performance out of an LSi
> > Megaraid (320-1) SCSI raid card (using the amr driver). The system is an
> > i386 (p4 xeon) with on-board adaptec scsi controllers and a SUPER GEM318
> > Saf-te backplane with 6 ea 146GB U320 10k rpm Hitachi drives.
> 
> Try again with 6.1, performance should be much better.
> 
> Kris

I cvsupped a 6.1 prerelease and found no performance improvements. I did
some further tests and the performance issues seem very specific to the
mirroring aspect of the raid:

The server has 6 drives so I set up a single-drive Raid0, dual-drive
raid0 and triple-drive raid0 using the lsi configuration tool. Since
these were simple stripes I would expect increasing performance with
each additional drive and the results matched these expectations. Write
speeds were roughly 50MB/sec, 100MB/sec, and 150MB/sec for the
single,dual, and triple drive stripes respectively with read speeds on
the order of 110MB/s or so.

I then went back to the beginning and setup a simple 2-drive mirror, and
a 4-drive raid 10 (spanning/striping over 2 mirrors). After reinstalling
the OS I ended up with the following results:
on the 2-drive mirror write speeds were an abysmal 7MB/sec and on the
4-disk raid10 write speeds were 8MB/sec. Looking at iostat during the
write (using dd if=/dev/zero of=filename bs=64k count=32768) I saw that
the 2-drive mirror seemed to jump between 35 tps and 65 tps with and
average 128kb per transaction while the 4-drive array maintained a more
consistent 65 tps. Read speeds on the 4-drive array were around
110MB/sec.

I cannot rule out the possiblity that the card itself is bad but in an
effort to try and do so I tried these tests using an install of
slackware 10.2 (using the 2.6 kernel and megaraid.ko module) and
reiserfs filesystem. On the 2-drive mirror I achieved write speeds of
22MB/sec and the 4-drive array saw write speeds of about 45MB/sec. Read
speeds on the 4-drive array were roughly 150MB/sec.

In summary, it would seem that the amr driver (which I also tested on a
tyan transport system (using the same hard drives) using FreeBSD amd64
(as it is an AMD system) and had the same results as on the i386 system)
seems to have issues when any type of data-duplication (mirror) scheme
is in place. Is there anything I can try and do to troubleshoot this on
a lower level? Is it possible that the card is actually just defective?

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: LSI Megaraid (amr) performance woes

2006-03-01 Thread Sven Willenberger
On Wed, 2006-03-01 at 15:08 -0500, Mike Tancsa wrote:
> At 02:10 PM 01/03/2006, Sven Willenberger wrote:
> 
> >I cvsupped a 6.1 prerelease and found no performance improvements. I did
> >some further tests and the performance issues seem very specific to the
> >mirroring aspect of the raid:
> 
> 
> I am not familiar with the LSI cards, but with older 3ware and the 
> ARECA cards, the raid sets when in any sort of redundancy mode must 
> initialize in the background before normal use.  Until that is 
> complete, performance is seriously slow.  Is the LSI doing that, and 
> perhaps just not telling you ?
> 
>  ---Mike 
> 

I had thought of this too so I disabled the rapid (background)
initialization option and let the raids build to completion the slow
way. So unless it is still building even after it is done (or is doing
some other odd processor-intensive crc checking or something) I don't
think this is the source of the problem.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


vinum to gvinum help

2006-06-26 Thread Sven Willenberger
I have an i386 system currently running 5.2.1-RELEASE with a vinum
mirror array (2 drives comprising /usr ). I want to upgrade this to
5.5-RELEASE which, if I understand correctly, no longer supports vinum
arrays. Would simply chaning /boot/loader.conf to read gvinum_load
instead of vinum_load work or would the geom layer prevent this from
working properly? If not, is there a recommended way of upgrading a
vinum array to a gvinum or gmirror array?

Thanks,

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: vinum to gvinum help

2006-06-26 Thread Sven Willenberger
On Mon, 2006-06-26 at 19:15 +0200, Roland Smith wrote:
> On Mon, Jun 26, 2006 at 12:22:07PM -0400, Sven Willenberger wrote:
> > I have an i386 system currently running 5.2.1-RELEASE with a vinum
> > mirror array (2 drives comprising /usr ). I want to upgrade this to
> > 5.5-RELEASE which, if I understand correctly, no longer supports vinum
> > arrays. Would simply chaning /boot/loader.conf to read gvinum_load
> > instead of vinum_load work or would the geom layer prevent this from
> > working properly? If not, is there a recommended way of upgrading a
> > vinum array to a gvinum or gmirror array?
> 
> Lost of things have changed between 5.2.1 and 5.5. I think it would be
> best to make a backup and do a clean reinstall.
> 
> Roland

Sadly this may not be an option; this is a production server that can at
best stand an hour or so of downtime. Between all the custom symlinked
directories, applications, etc, plus the sheer volume of data that would
need to be backed up, an in-place upgrade would be infinitely more
desirable. If it comes to the point of having to back up and do a fresh
install I suspect I would be using the 6.x series anyway. I was really
hoping that some way of upgrading in-place were available for vinum.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: /var/spool/clientmque 185meg

2005-04-16 Thread Sven Willenberger

Mike Tancsa presumably uttered the following on 04/16/05 19:31:
At 08:56 AM 16/04/2005, Warren wrote:
/var/spool/clientmqueue <-- 185meg
How do i get the messages from the above to the root mail folder of my
machine ?
im willing to provide any neccessary information to help.

Take a look at /var/log/maillog to see why its not being processed.  If 
necessary, bump up the loglevel in sendmail

In /etc/mail/sendmail.cf change
O LogLevel=9
to
O LogLevel=14
cd /etc/mail
make stop
make start
The general recommendation is to *never* edit the sendmail.cf directly. 
Rather cd to /etc/mail and edit your freebsd.mc file adding:

define(`confLOG_LEVEL',`14')dnl
(those are backtick, value, singlequote)
Then:
rm `hostname`.??
make && make install-cf
make restart
if you want to revert back you can simply delete the line from your 
freebsd.mc file and then remake and install the newly generated cf file.

Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Manipulating disk cache (buf) settings

2005-05-23 Thread Sven Willenberger
We are running a PostgreSQL server (8.0.3) on a dual opteron system with
8G of RAM. If I interpret top and vfs.hibufspace correctly (which show
values of 215MB and 225771520 (which equals 215MB) respectively. My
understanding from having searched the archives is that this is the
value that is used by the system/kernel in determining how much disk
data to cache. 

If that is in fact the case, then my question would be how to best
increase the amount of memory the system can use for disk caching.
Ideally I would like to have upwards of 1G for this type of
caching/buffering. I suspect it would not be as easy as simply adjusting
vfs.hibufspace upwards but would instead involving add either a
loader.conf or kernel option of some "master" setting that affects
hibufspace, bufspace, and related tunables. Or would this involve
editing one of the system files?

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Manipulating disk cache (buf) settings

2005-05-23 Thread Sven Willenberger
On Mon, 2005-05-23 at 10:44 -0700, John-Mark Gurney wrote:
> Sven Willenberger wrote this message on Mon, May 23, 2005 at 10:58 -0400:
> > We are running a PostgreSQL server (8.0.3) on a dual opteron system with
> > 8G of RAM. If I interpret top and vfs.hibufspace correctly (which show
> > values of 215MB and 225771520 (which equals 215MB) respectively. My
> > understanding from having searched the archives is that this is the
> > value that is used by the system/kernel in determining how much disk
> > data to cache. 
> 
> This is incorrect...  FreeBSD merged the vm and buf systems a while back,
> so all of memory is used as a disk cache..  The buf cache is still used
> for filesystem meta data (and for pending writes of files, but those buf's
> reference the original page, not local storage)...
> 
> Just as an experiment, on a quiet system do:
> dd if=/dev/zero of=somefile bs=1m count=2048
> and then read it back in:
> dd if=somefile of=/dev/null bs=1m
> and watch systat or iostat and see if any of the file is read...  You'll
> probably see that none of it is...
> 

Yes, confirmed as stated, this is great news then. In essence the
PostgreSQL planner can be told that the effective cache size is *much*
larger than that calculated by using vfs.hibufspace; should result in
some [hopefully] marked performance boosts.

btw:
> dd if=/dev/zero of=zerofile bs=1m count=2048
2048+0 records in
2048+0 records out
2147483648 bytes transferred in 43.381462 secs (49502335 bytes/sec)
> dd if=zerofile of=/dev/null bs=1m
2048+0 records in
2048+0 records out
2147483648 bytes transferred in 5.304807 secs (404818435 bytes/sec)

and that was on a 3GB RAM system so the caching scheme works great.

Sven


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


BKVASIZE for large block-size filesystems

2005-05-25 Thread Sven Willenberger
FreeBSD5.4-Stable amd64 on a dual-opteron system with LSI-Megaraid 400G+
partion. The filesystem was created with: newfs -b 65536 -f 8192 -e
15835 /dev/amrd2s1d

This is the data filesystem for a PostgreSQL database; as the default
page size (files) is 8k, the above newfs scheme has 8k fragments which
should fit nicely with the PostgreSQL page size. Now by default param.h
defines BKVASIZE as 16384 (which has been pointed out in other posts as
being *not* twice the default blocksize of 16k). I have modified it to
be set at 32768 but still see a high and increasing value of
vfs.bufdefragcnt which makes sense given the blocksize of the major
filesystem in use.

My question is are there any caveats about increasing BKVASIZE to 65536?
The system has 8G of RAM and I understand that nbufs decreases with
increasing BKVASIZE; how can I either determine if the resulting nbufs
will be sufficient or calculate what is needed based on RAM and system
usage?

Also, will increasing BKVASIZE require a complete make buildworld or, if
not, how can I remake the portions of system affected by BKVASIZE?

Sven


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: stack backtrace

2005-05-31 Thread Sven Willenberger
On Sun, 2005-05-29 at 11:29 -0700, Derek KuliƄski wrote:
> Hello,
> 
> Today I noticed following message in the log:
> 
> > KDB: stack backtrace:
> > kdb_backtrace(c07163b8,2,c661994c,0,22) at kdb_backtrace+0x2e
> > getdirtybuf(d109ebac,0,1,c661994c,1) at getdirtybuf+0x2b
> > flush_deplist(c282f4cc,1,d109ebd4,d109ebd8,0) at flush_deplist+0x57
> > flush_inodedep_deps(c15ba800,22a8a,c2760a7c,d109ec34,c04f7f87) at 
> > flush_inodedep_deps+0x9e
> > softdep_sync_metadata(d109eca4,c2760a50,50,c06ea8f0,0) at 
> > softdep_sync_metadata+0x9d
> > ffs_fsync(d109eca4,0,0,0,0) at ffs_fsync+0x4b2
> > fsync(c17ba000,d109ed14,4,d109ed3c,c0515916) at fsync+0x1a1
> > syscall(c069002f,2f,2f,81522b0,81522b0) at syscall+0x370
> > Xint0x80_syscall() at Xint0x80_syscall+0x1f
> > --- syscall (95, FreeBSD ELF32, fsync), eip = 0x28143dcf, esp = 0xbfbfd34c, 
> > ebp = 0xbfbfd358 ---
> > KDB: stack backtrace:
> > kdb_backtrace(c07163b8,2,c6683118,0,22) at kdb_backtrace+0x2e
> > getdirtybuf(d1098bac,0,1,c6683118,1) at getdirtybuf+0x2b
> > flush_deplist(c282facc,1,d1098bd4,d1098bd8,0) at flush_deplist+0x57
> > flush_inodedep_deps(c15ba800,1e9a4,c24cc974,d1098c34,c04f7f87) at 
> > flush_inodedep_deps+0x9e
> > softdep_sync_metadata(d1098ca4,c24cc948,50,c06ea8f0,0) at 
> > softdep_sync_metadata+0x9d
> > ffs_fsync(d1098ca4,0,0,0,0) at ffs_fsync+0x4b2
> > fsync(c17b9c00,d1098d14,4,c17b9c00,7) at fsync+0x1a1
> > syscall(2f,2f,bfbf002f,8111fe0,0) at syscall+0x370
> > Xint0x80_syscall() at Xint0x80_syscall+0x1f
> > --- syscall (95, FreeBSD ELF32, fsync), eip = 0x282dfdcf, esp = 0xbfbf9a8c, 
> > ebp = 0xbfbfb468 ---
> 
> System didn't seem to crash, what does it mean?
> 
> The OS is FreeBSD 5.4-RELEASE, it was compiled using:
> CPUTYPE?=i686
> COPTFLAGS= -O -pipe
> 
Apparently this is still somewhat of a mystery, but you are not the
first person to witness this:

http://lists.freebsd.org/pipermail/freebsd-stable/2005-April/013679.html
http://lists.freebsd.org/pipermail/freebsd-current/2004-July/031576.html

I don't know if anyone is actually looking into this (behind the scenes
maybe) or whether we just need to accumulate a critical mass of similar
notices to raise an eyebrow. If your system does not lock up as a result
(the way it used to in the earlier 5.x series) then perhaps it is
harmless ..

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


PostgreSQL's vacuumdb fails to allocate memory for non-root users

2005-06-29 Thread Sven Willenberger
FreeBSD 5.4-Release
PostgreSQL 8.0.3

I noticed that the nightly cron consisting of a vacuumdb was failing due
to "unable to allocate memory". I do have maintenance_mem set at 512MB,
and the /boot/loader.conf file sets the max datasize to 1GB (verified by
limit). The odd thing is that if I run the command (either vacuumdb from
the command line or vacuum verbose analyze from a psql session) as the
Unix user root (and any psql superuser) the vacuum runs fine. It is when
the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that
this memory error occurs. All users use the "default" class for
login.conf purposes which has not been modified from its installed
settings. Any ideas on how to a) troubleshoot this or b) fix this (if it
is something obvious that I just cannot see).

Thanks,

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [GENERAL] PostgreSQL's vacuumdb fails to allocate memory for non-root users

2005-06-29 Thread Sven Willenberger
On Wed, 2005-06-29 at 09:43 -0400, Douglas McNaught wrote:
> Sven Willenberger <[EMAIL PROTECTED]> writes:
> 
> > FreeBSD 5.4-Release
> > PostgreSQL 8.0.3
> >
> > I noticed that the nightly cron consisting of a vacuumdb was failing due
> > to "unable to allocate memory". I do have maintenance_mem set at 512MB,
> > and the /boot/loader.conf file sets the max datasize to 1GB (verified by
> > limit). The odd thing is that if I run the command (either vacuumdb from
> > the command line or vacuum verbose analyze from a psql session) as the
> > Unix user root (and any psql superuser) the vacuum runs fine. It is when
> > the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that
> > this memory error occurs. All users use the "default" class for
> > login.conf purposes which has not been modified from its installed
> > settings. Any ideas on how to a) troubleshoot this or b) fix this (if it
> > is something obvious that I just cannot see).
> 
> Is the out-of-memory condition occurring on the server or client side?
> Is there anything in the Postgres logs?

In this case they are one and the same machine ... i.e whether invoked
from the command-line as vacuumdb or invoked from psql (connecting to
localhost) as "vacuum analyze;" the memory error occurs. The logfile
reveals: 
ERROR:  out of memory
DETAIL:  Failed on request of size 536870910.


> You might put a 'ulimit -a' command in your cron script to make sure
> your memory limit settings are propagating correctly...

I created a cron that consisted of just that command (ulimit -a) and the
output revealed nothing abnormal (i.e. max dataseg still 1G, etc). This
occurs outside of cron also, (it was just the failing cronjob that
brought it to my attention). Again, if I log in as myself and try to run
the command vaccumdb -a -z it fails; if I su to root and repeat it works
fine. I am trying to narrow this down to a PostgreSQL issue vs FreeBSD
issue.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [GENERAL] PostgreSQL's vacuumdb fails to allocate memory for

2005-06-29 Thread Sven Willenberger
On Wed, 2005-06-29 at 11:21 -0400, Tom Lane wrote:
> Sven Willenberger <[EMAIL PROTECTED]> writes:
> > ERROR:  out of memory
> > DETAIL:  Failed on request of size 536870910.
> 
> That's a server-side failure ...
> 
> > Again, if I log in as myself and try to run
> > the command vaccumdb -a -z it fails; if I su to root and repeat it works
> > fine. I am trying to narrow this down to a PostgreSQL issue vs FreeBSD
> > issue.
> 
> That's fairly hard to believe, assuming that you are talking to the same
> server in both cases (I have seen trouble reports that turned out to be
> because the complainant was accidentally using two different servers...)
> The ulimit the backend is running under couldn't change depending on
> where the client is su'd to.
> 
> Is it possible that you've got per-user configuration settings that
> affect this, like a different maintenance_mem value for the root user?
> 
>   regards, tom lane
> 
I have done some more tests and tried to keep the results of vacuumdb
distinct from connecting to the backend (psql -U pgsql ...) and running
vaccum analyze. Apparently the hopping back and forth from both methods
interfered with my original interpretations of what appeared to be
happening. Anyway, here is what I see:
First test psql connection version:
psql then vacuum analyze => works fine whether the current unix user is
root or plain user. (ran this a couple times via new psql connections to
verify).
Then quit psql and move to command line
vacuumdb => whether running as su -l pgsql -c "vacuumdb -a -z" (or
specifying a dbname instead of all) or directly as a user the out of
memory error occurs.
If I then connect via psql to the backend and try to run vacuum analyze
I receive an out of memory error.

This last connection to psql after a failed vacuumdb was confabulating
my interpretations earlier of the error being based on unix user. top
shows:
  PID USERNAME  PRI NICE   SIZERES STATE  C   TIME   WCPUCPU
COMMAND
 6754 pgsql   40   602M 88688K sbwait 0   0:03  0.00%  0.00%
postgres
until I disconnect the psql session. I can then psql again and the same
error happens (out of memory) and top shows the same again. At this
point I am not sure if it is a memory issue of vacuumdb, vacuum itself,
or the FreeBSD memory management system. Again, if enough time passes
(or some other events) since I last try vacuumdb, then running vacuum
[verbose][analyze] via a psql connection works fine.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [GENERAL] PostgreSQL's vacuumdb fails to allocate memory for non-root users

2005-06-29 Thread Sven Willenberger
On Wed, 2005-06-29 at 14:59 -0400, Vivek Khera wrote:
> On Jun 29, 2005, at 9:01 AM, Sven Willenberger wrote:
> 
> > Unix user root (and any psql superuser) the vacuum runs fine. It is  
> > when
> > the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that
> > this memory error occurs. All users use the "default" class for
> > login.conf purposes which has not been modified from its installed
> > settings. Any ideas on how to a) troubleshoot this or b) fix this  
> > (if it
> > is something obvious that I just cannot see).
> 
> This doesn't make sense: the actual command is executed by the  
> backend postgres server, so the uid of the client program doens't  
> make a bit of difference.
> 
> You need to see exactly who is generating that error.  It certainly  
> is not the Pg backend.
> 
The issue being tied to a certain "login" user has been negated by
further testing (the illusion that it was based on user happened as a
result of the order in which I ran tests to try and find out what was
going on ) -- it does seem tied to invoking vacuumdb at this point. As a
point of clarification, when maxdsiz and dfldsiz are set, those values
are per "process" not per "user", correct? Something I have noticed,
when the memory error occurs during the psql session (after a failed
vacuumdb attempt) the memory stays at 600+MB in top (under size) until
the psql session is closed -- that may just be the way top reports it
though.

Sven

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: PostgreSQL's vacuumdb fails to allocate memory for non-root users

2005-06-29 Thread Sven Willenberger
On Wed, 2005-06-29 at 21:54 +0200, Juergen Dankoweit wrote:
> Hello,
> 
> Am Mittwoch, den 29.06.2005, 14:59 -0400 schrieb Vivek Khera:
> > On Jun 29, 2005, at 9:01 AM, Sven Willenberger wrote:
> > 
> > > Unix user root (and any psql superuser) the vacuum runs fine. It is  
> > > when
> > > the unix user is non-root (e.g. su -l pgsql -c "vacuumdb -a -z") that
> > > this memory error occurs. All users use the "default" class for
> > > login.conf purposes which has not been modified from its installed
> > > settings. Any ideas on how to a) troubleshoot this or b) fix this  
> > > (if it
> > > is something obvious that I just cannot see).
> > 
> > This doesn't make sense: the actual command is executed by the  
> > backend postgres server, so the uid of the client program doens't  
> > make a bit of difference.
> > 
> > You need to see exactly who is generating that error.  It certainly  
> > is not the Pg backend.
> 
> Sorry for that possible stupid question. But why do you think that the
> PG backend does not generate the error?
> I use PostgreSQL since many years under FreeBSD and it is the first time
> to hear from such an error.

As the postgres logfiles have the out of memory error in them it would
appear that it is the backend generating this error. Since, I am
assuming here, dfldsiz and maxdsiz (set in loader.conf at 850MB and 1G
respectively)) are per process, and since I have maintenance work_mem
set at 512M (this all on a 3G box) I am not sure how it fails to
allocate the memory; although top reports only 25-30MB free, there is
some 2.5G in Inactive so there is plenty of memory available. I am
currently running memtest to see if I may have flaky RAM ...

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [GENERAL] PostgreSQL's vacuumdb fails to allocate memory for non-root users

2005-06-29 Thread Sven Willenberger
On Wed, 2005-06-29 at 16:40 -0400, Charles Swiger wrote:
> On Jun 29, 2005, at 4:12 PM, Sven Willenberger wrote:
> [ ... ]
> > Something I have noticed,
> > when the memory error occurs during the psql session (after a failed
> > vacuumdb attempt) the memory stays at 600+MB in top (under size) until
> > the psql session is closed -- that may just be the way top reports it
> > though.
> 
> Double-check your system limits via "ulimit -a" or "ulimit -aH".  By  
> default, FreeBSD will probably restrict the maximum data size of the  
> process to 512MB, which may be what you are running into.  You can  
> rebuild the kernel to permit a larger data size, or else tweak /boot/ 
> loader.conf:
> 
>  echo 'kern.maxdsiz="1024M"' >> /boot/loader.conf
> 

:>ulimit -a
cpu time   (seconds, -t)  unlimited
file size   (512-blocks, -f)  unlimited
data seg size   (kbytes, -d)  1048576
stack size  (kbytes, -s)  65536
core file size  (512-blocks, -c)  unlimited
max memory size (kbytes, -m)  unlimited
locked memory   (kbytes, -l)  unlimited
max user processes  (-u)  5547
open files  (-n)  11095
virtual mem size(kbytes, -v)  unlimited
sbsize   (bytes, -b)  unlimited
:> cat /boot/loader.conf
kern.maxdsiz="1073741824"
kern.dfldsiz="891289600"

and if I don't run vacuumdb at all, but rather connect to the backend
via psql and run vacuum, it works ok with full memory allocation. Still
testing RAM to see if the issue is physical.

Sven


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [GENERAL] PostgreSQL's vacuumdb fails to allocate memory for

2005-06-29 Thread Sven Willenberger
On Wed, 2005-06-29 at 16:58 -0400, Sven Willenberger wrote:
> On Wed, 2005-06-29 at 16:40 -0400, Charles Swiger wrote:
> > On Jun 29, 2005, at 4:12 PM, Sven Willenberger wrote:
> > [ ... ]
> > > Something I have noticed,
> > > when the memory error occurs during the psql session (after a failed
> > > vacuumdb attempt) the memory stays at 600+MB in top (under size) until
> > > the psql session is closed -- that may just be the way top reports it
> > > though.
> > 
> > Double-check your system limits via "ulimit -a" or "ulimit -aH".  By  
> > default, FreeBSD will probably restrict the maximum data size of the  
> > process to 512MB, which may be what you are running into.  You can  
> > rebuild the kernel to permit a larger data size, or else tweak /boot/ 
> > loader.conf:
> > 
> >  echo 'kern.maxdsiz="1024M"' >> /boot/loader.conf
> > 
> 
> :>ulimit -a
> cpu time   (seconds, -t)  unlimited
> file size   (512-blocks, -f)  unlimited
> data seg size   (kbytes, -d)  1048576
> stack size  (kbytes, -s)  65536
> core file size  (512-blocks, -c)  unlimited
> max memory size (kbytes, -m)  unlimited
> locked memory   (kbytes, -l)  unlimited
> max user processes  (-u)  5547
> open files  (-n)  11095
> virtual mem size(kbytes, -v)  unlimited
> sbsize   (bytes, -b)  unlimited
> :> cat /boot/loader.conf
> kern.maxdsiz="1073741824"
> kern.dfldsiz="891289600"
> 
> and if I don't run vacuumdb at all, but rather connect to the backend
> via psql and run vacuum, it works ok with full memory allocation. Still
> testing RAM to see if the issue is physical.
> 
> Sven
> 
> 
I have found the answer/problem. On a hunch I increased maxdsiz to 1.5G
in the loader.conf file and rebooted. I ran vacuumdb and watched top as
the process proceeded. What I saw was SIZE sitting at 603MB (which was
512MB plus another 91MB which corresponded nicely to the value of RES
for the process. A bit into the process I saw SIZE jump to 1115 -- i.e.
another 512 MB of RAM was requested and this time allocated. At one
point SIZE dropped back to 603 and then back up to 1115. I suspect the
same type of issue was occuring in regular vacuum from the psql client
connecting to the backend, for some reason not as frequently. I am
gathering that maintenance work mem is either not being recognized as
having already been allocated and another malloc is made or the process
is thinking the memory was released and tried to grab a chunk of memory
again. This would correspond to the situation where I was size stuck at
603MB after a failed memory allocation (when maxdsiz was only 1G). Now I
am not sure if I will run into the situation where yet another 512MB
request would be made (when already 1115 appears in SIZE) but if so,
then the same problem will arise. I will keep an eye on it ...

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: [GENERAL] PostgreSQL's vacuumdb fails to allocate memory for

2005-06-29 Thread Sven Willenberger



Tom Lane presumably uttered the following on 06/29/05 19:12:

Sven Willenberger <[EMAIL PROTECTED]> writes:


I have found the answer/problem. On a hunch I increased maxdsiz to 1.5G
in the loader.conf file and rebooted. I ran vacuumdb and watched top as
the process proceeded. What I saw was SIZE sitting at 603MB (which was
512MB plus another 91MB which corresponded nicely to the value of RES
for the process. A bit into the process I saw SIZE jump to 1115 -- i.e.
another 512 MB of RAM was requested and this time allocated. At one
point SIZE dropped back to 603 and then back up to 1115. I suspect the
same type of issue was occuring in regular vacuum from the psql client
connecting to the backend, for some reason not as frequently. I am
gathering that maintenance work mem is either not being recognized as
having already been allocated and another malloc is made or the process
is thinking the memory was released and tried to grab a chunk of memory
again.



Hmm.  It's probably a fragmentation issue.  VACUUM will allocate a 
maintenance work mem-sized chunk during command startup, but that's

likely not all that gets allocated, and if any stuff allocated after
it is not freed at the same time, the process size won't go back down.
Which wouldn't be a killer in itself, but unless the next iteration
is able to fit that array in the same space, you'd see the above
behavior.

So maintenance work mem is not a measure of the max that can allocated 
by a maintenance procedure but rather an increment of memory that is 
requested by a maintenance process (which currently are vacuum and 
index, no?), if my reading of the above is correct.



BTW, do you have any evidence that it's actually useful to set
maintenance work mem that high for VACUUM?  A quick and dirty solution
would be to bound the dead-tuples array size at something more sane...



I was under the assumption that on systems with RAM to spare, it was 
beneficial to set main work mem high to make those processes more 
efficient. Again my thinking was that the value you set for that 
variable determined a *max* allocation by any given maintenance process, 
not a memory allocation request size. If, as my tests would indicate, 
the process can request and receive more memory than specified in 
maintenance work mem, then to play it safe I imagine I could drop that 
value to 256MB or so.


Sven
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: SCSI troubles

2005-07-06 Thread Sven Willenberger
On Wed, 2005-07-06 at 00:29 -0700, Ade Lovett wrote:
> Niki Denev wrote:
> > From what i understand this is not exactly a FreeBSD problem, but rather
> > a consequence of U320 being really hard on the hardware with pushing it
> > to the limits.
> 
> Incorrect.  The relevant parts of the output you pasted are:
> 
>   ahd
>   Seagate drives
> 
> Attaching more than one Seagate drive to a single Adaptec chain will
> result in various weird and wonderful behavior as you've described.
> 
> This is above and beyond well known (and documented) issues with data
> loss and corruption with certain firmware revisions on Seagate drives.
> 
> You have essentially two options:
> 
> (1) disable the (on-board) adaptec controller, and use something else
> (LSI cards work pretty good)
> 
> (2) chunk the Seagate drives, and replace them with some other vendor
> (Hitachi, for example, in our high-stress environments, show equivalent
> MTBFs)
> 
We went with option 2 about a year or so ago (Hitachi drives in our
case) as dealing with Seagate on this issue turned into an exercise in
frustration (they suggested things like turning off SMP or using a PCI
network card instead of the onboard (em) network. As is pointed out the
issue really crops up with more than one seagate drive on the adaptec
(ahd) controller, even with the drives upated to their latest bios.
Switching to a different hard drive manufacturer solved our woes.

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ahd0: Invalid Sequencer interrupt occurred.

2005-11-15 Thread Sven Willenberger
On Fri, 2005-11-11 at 22:57 -0800, Ade Lovett wrote:
> On Nov 11, 2005, at 12:51 , Amit Rao wrote:
> > 0) Upgrade to Seagate 10K.7 drive firmware level 0008. That seems  
> > to help. One "ahd sequencer error" message still appears at boot,  
> > but after that it seems to work (with your fingers crossed).
> 
> Of course, you then spend far too much time ensuring that any  
> replacement drives are flashed appropriately (which, afaict,  
> *requires* Windows to do), and also running the gauntlet of further  
> problems down the road when you throw the drives into a new machine  
> with a subtly different HBA bios.
> 
> No thanks, I'll stick with option (2).  A few more months, and  
> Seagate drives will be a nice distant memory that I can look back on  
> in a few years, and laugh nervously about.
> 
> -aDe

There was a flash-utility that was [hand-rolled?] able to run on FreeBSD
and I did successfully flash some Seagate drives' firmware -- didn't
help any as far as the error [messages] went so we dropped Seagate
drives altogether a little over a year ago. Since then we have been
using the IBM/Hitachi drives with no issues (much easier to change drive
manufacturers than try to respec the servers we were using or do some of
the borderline-absurd workarounds that Seagate suggested).

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Creating a system RAID-10 device

2006-01-17 Thread Sven Willenberger
I hope this is the appropriate mailing list for this question, if not
please redirect me as needed.

My goal is to create a filesystem consisting of a striped raid array
over 3 mirror raid arrays that would encompass the entire filesystem
including the /boot and root partitions.

Generally I would create a RAID10 from 6 disks as follows:

gmirror label -v -b round-robin md1 da0 da1
gmirror label -v -b round-robin md2 da2 da3
gmirror label -v -b round-robin md3 da4 da5

gstripe label -v -s 131072
md0 /dev/mirror/md1 /dev/mirror/md2 /dev/mirror/md3

newfs /dev/stripe/md0

naturally the problem here is that it cannot be done on a system that
booted from da0. I have seen the example of setting up a mirrored system
drive
(http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/geom-mirror.html )

which won't quite work for my case either. Using this method I could
probably get the one mirror (md1) to work, but I know of no way of then
adding the other 2 mirror sets and then redoing the system to stripe
across all 3 mirrored sets.

The only thing I could think of was to boot from the livecd and create
the 6-disk array and then trying to install FreeBSD onto this
filesystem. In order to do this the installer would have to
recognize /dev/stripe/md0 as a valid "drive" -- is there any way to have
this happen?

Sven

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: serious vinum bug in 4-10 RELEASE?-solved?

2004-07-13 Thread Sven Willenberger

Steve Shorter wrote:
On Mon, Jul 12, 2004 at 06:40:01AM +0930, Greg 'groggy' Lehey wrote:
I see no drives.

	Ideas?

I have concluded that this is the result of somekind
of vinum/hardware incompatibility. The problem in question
occured during the upgrade to faster disks, specifically,
Seagate Cheetah ST318453LC on a DELL 2450. If I swap back
the old Quantum Atlas 36G disk, the problem entirely
disappears. The new disks function ok with UFS partitions
but not vinum. It is 100% repeatable.
Don't know why.
We have had issues with Cheetah U320 harddrives (at least the 10K 80-pin 
varieties on our Supermicro boxes) with a high percentage of drive 
failures ( > 10%) and communications errors across the scsi bus. These 
errors disappear when we reverted back to IBM/Hitachi drives. The 
Seagate issues occur with both FreeBSD 4.x and 5.x series so there is 
something in the Seagate firmware (I believe) that is interacting poorly 
with FreeBSD. We have experienced these issues with vinum setups and 
other configurations where the there are either multiple Seagate drives 
or multiple drives where one of them is a Seagate. Firmware updates did 
not help. I have not had this problem where there is only one drive and 
it occupies da0.

Sven
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Why the mode of /dev/null is changed after system reboot?

2004-11-01 Thread Sven Willenberger
On Mon, 2004-11-01 at 09:30 -0800, Kris Kennaway wrote:
> On Mon, Nov 01, 2004 at 09:19:12AM +0800, Iva Hesy wrote:
> > I have a gateway running FreeBSD 4.10-p3. Normally, the mode of
> > /dev/null should be 666, but recently, I find that its mode is changed
> > to 600 automatically after reboot, I have checked all /etc/rc* and
> > /usr/local/etc/rc.d/*, but I can't get anything that led to it...:-(
> 
> Probably a local error.  Try changing scripts to #!/bin/sh -x and
> carefully watch (or log) the boot process.  Start with any local
> scripts you have since it's most likely to be a problem there.
> 
> Kris

Actually I have found this happening on my 4.10 boxen as well. I thought
it was some one-time glitch and just chmodded the thing back. Didn't
even think about it until I saw this post. I will try to see if I can
catch the circumstances surrounding it if it happens again.

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"