On Feb 10, 2006, at 1:28 PM, Michael Reifenberger wrote:

On Fri, 10 Feb 2006, Markus Trippelsdorf wrote:
...
...
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=58914495
ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=123039679
ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=54591167

It looks like bad cabling to me. Try new cables and also run
smartctl -a /dev/ad0 (and ad1) to check if the hardware is OK.

smartctl doesn't reports any errors, and accessing only on disk at a time
doesn't give errors either. So probably cabling isn't the issue here.
More likely a timing/locking interaction between gmirror/ata...

Bye/2
---
Michael Reifenberger, Business Development Manager SAP-Basis, Plaut Consulting
Comp: [EMAIL PROTECTED] | Priv: [EMAIL PROTECTED]
http://www.plaut.de | http:// www.Reifenberger.com

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable- [EMAIL PROTECTED]"



I have the same problem with 6.0 and gmirror. It's not cabling or HW problem as it works fine with FreeBSD 5.4 or previous, OpenBSD 3.8 and Linux 2.6.x, with and without mirroring.

I have a 2xPIII 700 MHz SMP system with Maxtor PCI SATA controller and two 250 GB Maxtor disks, plus SCSI disks for the OS.

atapci0: <Promise PDC20375 SATA150 controller> port 0x3080-0x30bf, 0x30c0-0x30cf,0x3000-0x307f mem 0xf4220000-0xf4220fff, 0xf4200000-0xf421ffff irq 20 at device 4.0 on pci2
ad4: 239372MB <Maxtor 7L250S0 BANC1E00> at ata2-master SATA150
ad6: 239372MB <Maxtor 7L250S0 BANC1E00> at ata2-master SATA150

The disks are gmirror'ed and never completes a synchronization. When reaching around 40%-45% of gmirror resynch the system crashes. No log is written and the screen has some garbage in it. The system also crashes occasionally under heavy load, after logging some TIMEOUT - READ_DMA errors.

The system is then unable to boot again. It crashes during the boot sequence when the mirror is reestablished and the resynch is started again. It crashes also if the resynch is prevented (through NOAUTOSYNCH). I need to boot the installation CD and clean gmirror metadata on one disk to be able to boot.

I have 6.0-RELEASE-p4 (but it happens on any 6.0) and this is my kernel:

# include standard distribution's SMP kernel build file, which in turns include the generic kernel build file (named GENERIC)
include         SMP

# set custom kernel ident name
ident           ZOE_020

# additional/overridden settings starts here
nooptions PREEMPTION # Disable kernel thread preemption

# standard system settings
maxusers 64 # 64 users is a lot, but we should have plenty of memory! options INCLUDE_CONFIG_FILE # Include this file in kernel for reference

# memory settings - 2 GBytes for data or stack maximum size, 1 GByte as default initial size
options         MAXDSIZ=(2048UL*1024*1024)
options         MAXSSIZ=(2048UL*1024*1024)
options         DFLDSIZ=(1024UL*1024*1024)

# SYSV options (shared memory, semaphores, message queues)
options SEMMAP=63 # Maximum number of entries in a semaphore map. options SEMMNI=512 # Maximum number of System V semaphores that can be used on the system at one time. options SEMMNS=512 # Total number of semaphores system wide options SEMMNU=512 # Total number of undo structures in system options SEMMSL=64 # Maximum number of System V semaphores that can be used by a single process at one time. options SEMOPM=128 # Maximum number of operations that can be outstanding on a single System V semaphore at one time. options SEMUME=48 # Maximum number of undo operations that can be outstanding on a single System V semaphore at one time. options SHMALL=262144 # Maximum number of shared memory pages system wide. options SHMMAX=(SHMMAXPGS*PAGE_SIZE+1) # Maximum size, in bytes, of a single System V shared memory region. options SHMMAXPGS=262144 # Maximum size, in pages, of a single System V shared memory region. options SHMMIN=2 # Minimum size, in bytes, of a single System V shared memory region. options SHMMNI=128 # Maximum number of shared memory regions that can be used on the system at one time. options SHMSEG=32 # Maximum number of System V shared memory regions that can be attached to a single process at one time.
options         MSGMNB=2049             # Max number of chars in queue
options MSGMNI=41 # Max number of message queue identifiers
options         MSGSEG=2049             # Max number of message segments
options MSGSSZ=16 # Size of a message segment (must be a power of 2 between 8 and 1024) options MSGTQL=41 # Max number of messages in system

I need to go to production soon and I want FreeBSD 6.0 as Linux/other- BSD don't fit my requirements (e.g. jail, GEOM...), so I changed the controller to a Promise TX2300. It still has problems with 6.0 as again, at 40%-45% of mirror rebuild I get this error:

ad4: req=0xc1c43d48 SETFEATURES SET TRANSFER MODE semaphore timeout !! DANGER Will Robinson !!

... I am quite worried as I need to trust storage! Any idea of what is the problem?

If needed I can provide more data or make tests. I have a photo of the screen of the panic during the boot sequence (1MB)

Thanks and excuse me for the bad english!

Paolo

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to