Re: IDE DMA Timeouts

2006-02-20 Thread hank
I'm experiencing the same issue with a brand new install, though
different hardware ... I've an Adaptec 1205 SATA controller and a pair
of Samsung 160 gig drives.

Error only occurs under load, the first time it happened the machine
locked and spontaneously rebooted and required a manual fsck of the
mirror; which completed successfully, but then rebuilt the mirror very
slowly ...

Second time, the stressor (copying a 5 gig drive image to it) the
machine held up, but got many DMA WRITE errors to ad6 ... below is the
smartctl output on that drive, but I don't see anything other than the
DMA WRITE errors, though I confess to not being really up on what
smartctl outputs ... 

lighty# ./smartctl -a /dev/ad6
smartctl version 5.33 [i386-portbld-freebsd6.0] Copyright (C) 2002-4
Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model: SAMSUNG HD160JJ
Serial Number:S08HJ1MYC31118
Firmware Version: ZM100-33
User Capacity:160,041,885,696 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 4a
Local Time is:Mon Feb 20 08:36:15 2006 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

The errors don't cut and paste nicely ... but there are 35 instances of
a DMA WRITE failed, followed by 4 DMA READ fails ... 

Thoughts, comments and suggestions welcome.

Hank
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Which PCI SATA controller?

2006-02-20 Thread hank
Sorry ... just to clarify, this would be running 6.0 and it needs to be
a 32bit card ... so as awesome as 3ware cards might be, it does me
little good ... 

On Mon, Feb 20, 2006 at 12:05:12PM -0600, Hank Marquardt wrote:
> So I'm having similar issues to others with gmirror and SATA throwing
> DMA WRITE failures under load ... In my searching I"ve found a few
> references to the Sil3112 chipset being an issue and it's what's in my
> cheapy Adaptec card ... 
> 
> So if I want to proceed, what is a good PCI SATA controller?
> 
> -- 
> Hank Marquardt <[EMAIL PROTECTED]>
> GPG Id: 2BB5E60C
> Fingerprint: D807 61BC FD18 370A AC1D  3EDF 2BF9 8A2D 2BB5 E60C
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "[EMAIL PROTECTED]"

-- 
Hank Marquardt <[EMAIL PROTECTED]>
http://web.yerpso.net
GPG Id: 2BB5E60C
Fingerprint: D807 61BC FD18 370A AC1D  3EDF 2BF9 8A2D 2BB5 E60C
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


5.5-stable network interface rl0 stops working

2006-07-05 Thread Hank Hampel
Hello everybody,

I have a very disturbing problem with one of our FreeBSD 5.5-stable
machines. It is a box on which ~10 jail systems run, each with
small to moderate network traffic.

Now from time to time - sometimes after a few days, sometimes after a
couple of weeks - the network interface rl0 (which is the main
interface on the maschine, rl1 is for backups/internal use only) stops
working.

Each jailed system has its own firewall ruleset, permitting only
traffic for the services in that specific jail. The packet filter used
is ipfw. Some of the rules are stateful (keep-state).

When rl0 stops working ipfw loggs lots of denied packets so that it
seems that the dynamic (keep-state) rules don't work any longer. We
checked and increased the buffers for the dynamic rules to no avail -
I doubt they are part of the problem. I'm not even sure ipfw is part
of the problem.

After the stop on the interface occurs there is no other way to get
the interface up and running again than rebooting the whole machine.
Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything.

The bad thing is I haven't found any way to trigger this problem so
that I can only check and change things and wait if the situation
improves or not. For example I've already set debug.mpsafenet="0" but
this doesn't help, in contrast it seems to worsen the problem a little
bit.

Find attached the dmesg output of the machine. If any other
information is needed to hunt down the cause of this problem please
let me know. I checked various list archives but haven't found a clue
yet.

-[ dmesg ]-
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.5-STABLE #5: Tue May 30 13:51:55 CEST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/SHAWSHANK
WARNING: MPSAFE network stack disabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2411.60-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf34  Stepping = 4
  
Features=0xbfebfbff
real memory  = 2147418112 (2047 MB)
avail memory = 2096037888 (1998 MB)
ACPI APIC Table: 
ioapic0  irqs 0-23 on motherboard
npx0:  on motherboard
npx0: INT 16 interface
acpi0:  on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0:  on acpi0
acpi_button0:  on acpi0
pcib0:  port 0x1000-0x10bf,0xcf8-0xcff on acpi0
pci0:  on pcib0
agp0:  mem 0xe800-0xefff at device 0.0 
on pci0
pcib1:  at device 1.0 on pci0
pci1:  on pcib1
pcib2:  at device 30.0 on pci0
pci2:  on pcib2
pci2:  at device 0.0 (no driver attached)
rl0:  port 0x9000-0x90ff mem 0xf500-0xf5ff 
irq 21 at device 1.0 on pci2
miibus0:  on rl0
rlphy0:  on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:02:2a:d5:39:74
rl1:  port 0x9400-0x94ff mem 0xf5001000-0xf50010ff 
irq 22 at device 2.0 on pci2
miibus1:  on rl1
rlphy1:  on miibus1
rlphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl1: Ethernet address: 00:02:2a:d5:39:53
isab0:  at device 31.0 on pci0
isa0:  on isab0
atapci0:  port 
0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 31.1 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
pci0:  at device 31.3 (no driver attached)
acpi_tz0:  on acpi0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A, console
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
pmtimer0 on isa0
orm0:  at iomem 0xc-0xc7fff on isa0
sc0:  at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
vga0:  at port 0x3c0-0x3df iomem 0xa-0xb on isa0
atkbdc0:  at port 0x64,0x60 on isa0
atkbd0:  irq 1 on atkbdc0
kbd0 at atkbd0
ppc0: parallel port not found.
Timecounter "TSC" frequency 2411601876 Hz quality 800
Timecounters tick every 10.000 msec
ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to 
deny, logging disabled
ad0: 114497MB  [232629/16/63] at 
ata0-master UDMA100
acd0: DVDROM  at ata1-master PIO4
Mounting root from ufs:/dev/ad0s1a
-[ dmesg ]-


Best regards, Hank


pgpesF2HPryqd.pgp
Description: PGP signature


Re: 5.5-stable network interface rl0 stops working

2006-07-06 Thread Hank Hampel
Hi Roland,

On (060705), Roland Smith wrote:
> > couple of weeks - the network interface rl0 (which is the main
> > interface on the maschine, rl1 is for backups/internal use only) stops
> Are they physically on the motherboard? Or on PCI cards? In the latter
> case try reseating the card in the slot.

fortunately they are PCI cards, so I'll check the seating.

> Try switching rl0 and rl1, and see if te problem persists. Also,
> swapping out the ethernet cable is worth trying.

Switching/exchanging the cards was an option we haven't tried yet
although it came to my mind earlier - for sure the strangest problems
are hardware related so I'll give this a try and report back.

Swapping out the ethernet cable was one of the first things I checked
but to no avail. But I'm not really sure if the switch isn't part of
the problem (although all other ports function correctly) so I'll
change the switch port to.

> Another thing to check is if rl0 is sharing an interrupt with another
> device. That can cause problems.

No there is no interupt sharing for this device but thanks for this
hint, I hadn't checked it yet.

> > When rl0 stops working ipfw loggs lots of denied packets so that it
> > seems that the dynamic (keep-state) rules don't work any longer. We
> Does the problem persist without ipfw? I've got an rl0 card on my
> workstation (6.1-STABLE, amd64, using PF without problems)

Unfortunately I can't check this because we use ipfw to generate
traffic statistics for the jails. But when the interface stops working
it has no impact to disable the firewall, short of that no log messages
are generated any longer.

> > After the stop on the interface occurs there is no other way to get
> > the interface up and running again than rebooting the whole machine.
> > Restarting /etc/rc.d/netif, the jails or ipfw doesn't help anything.
> What does ifconfig say after the interface stops working?

When the interface stops working ifconfig seems "to think" everything
is still ok. There is no hint in the output of ifconfig that the
interface is not working and ifconfig down/up doesn't help any.

> Anything in the logs, except the denied packets?

No strange enough there is no other hint in the logs that the system
is not working. At first I thought it was kind of an ipfw problem
because packets seem to arrive on the host but the responses get
blocked by ipfw. I'll check with tcpdump the next time it happens if
it's true that packets still arrive on the system.

On the other hand if ipfw is part of the problem (especially the
dynamic rules) then flushing ipfw should help I think - but it
doesn't. So maybe it's an hardware issue, I'll definitly check this
and report back. Thanks for the hints and tips!


Best regards, Hank


pgptYmaa0xylf.pgp
Description: PGP signature


Which PCI SATA controller?

2006-02-20 Thread Hank Marquardt
So I'm having similar issues to others with gmirror and SATA throwing
DMA WRITE failures under load ... In my searching I"ve found a few
references to the Sil3112 chipset being an issue and it's what's in my
cheapy Adaptec card ... 

So if I want to proceed, what is a good PCI SATA controller?

-- 
Hank Marquardt <[EMAIL PROTECTED]>
GPG Id: 2BB5E60C
Fingerprint: D807 61BC FD18 370A AC1D  3EDF 2BF9 8A2D 2BB5 E60C
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


installworld fails gnu/usr.bin/groff/font

2001-09-25 Thread Hank Marquardt

Installworld is failing, not finding 'R' tag in
/gnu/usr.bin/groff/font ... as I had nothing in those directories
anyway, I just removed the font directory from the makefile in
.groff and the install went fine from there.  Originally
this happened with a cvsup from 9/22 -- I figured maybe there
was an update, so I did again on 9/24 and the same error --
I use cvsup11 as my server.

I've never needed to do minor surgery on makefiles to get
the install to work so I figured I'd report it.

Hank

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message