Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux
On 2005-03-09T18:36:37, Alex Aizman <[EMAIL PROTECTED]> wrote: > Heartbeat is good for reliability, etc. WRT "getting paged-out" - > non-deterministic (things depend on time), right? Right, if we didn't get scheduled often enough for us to send our heartbeat messages to the other peers, they'll evict us from the cluster and fence us, causing a service disruption. With all these protections in place though, we can run at roughly 50ms heartbeat intervals from user-space, reliably, which allows us a node dead timer of ~200ms. I think that's pretty damn good. (Of course, realistically, even for subsecond fail-over, 200ms keep alives are sufficient, and 50ms would be quite extreme. But, it works.) > >That works well in our current development series, and if you want to > >share code, you can either rip it off (Open Source, we love ya ;) or we > >can spin off these parts into a sub-package for you to depend on... > If it's not a big deal :-) let's do the "sub-package" option. I've brought this up on the linux-ha-dev list. When do you need this? Sincerely, Lars Marowsky-Brée <[EMAIL PROTECTED]> -- High Availability & Clustering SUSE Labs, Research and Development SUSE LINUX Products GmbH - A Novell Business - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
aacraid died on kernel 2.4.27
I've been trying, without success, to get the aacraid driver to work reliably on a Dell 2650 Poweredge. I'm using Debian so I tried kernel 2.6.8 first (Debian has it packaged nicely but from a SCSI driver point of view it's just a standard kernel.org source). 2.6.8 worked... but died as soon as we put the server under load. 2.6.11 also died under load. Yesterday I put 2.4.27 on the box because I had understood that the aacraid is stable in that release of the kernel. But a few hours ago the box died in exactly the same way (it was under quite heavy load). Unfortunately, I can't give you error messages because I don't have any log from the failures. It comes out on the console and I don't have it saved anywhere. But it definitely is the raid controller. Maybe someone can answer the following for me: - is the driver understood to be stable in 2.4.27? - is there another driver I could try (would the pre-Cox one work?) - is there anything I can do to alleviate the problem? Nic Ferrier http://www.tapsellferrier.co.uk - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ADAPTEC Ultra320 hotplugging with 2.6.x
Hi all, we have some problems replacing a SCSI disk in runtime. The problems started with kernel 2.6.x, with kernels 2.4.x we never saw any problems. We tried all kernels from 2.6.8 to 2.6.11-rc3-bk3-20050206171922-bigsmp, the last one we found for SuSE 9.2. All kernels showed this problem. Our boxes have 2 controllers, here the shortened info out of boot.msg (for one controller only, the other is similar): <6>scsi1: Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11 <4> <4> aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz,512 SCBs <4> <4>(scsi1:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit) <5> Vendor: MAXTORModel: ATLAS10K5_147SCA Rev: JNZ3 <5> Type: Direct-Access ANSI SCSI revision: 03 <4>scsi1:A:0:0: Tagged Queuing enabled. Depth 32 <5>SCSI device sda: 287332384 512-byte hdwr sectors (147114 MB) <5>SCSI device sda: drive cache: write back <5>SCSI device sda: 287332384 512-byte hdwr sectors (147114 MB) <5>SCSI device sda: drive cache: write back <6> sda: sda1 sda2 <5>Attached scsi disk sda at scsi1, channel 0, id 0, lun 0 <5> Vendor: ESG-SHV Model: SCA HSBP M15 Rev: 0.11 <5> Type: Processor ANSI SCSI revision: 02 Each controller is responsible for 5 SCA disks. The disks are mirrored in a software RAID1 (mdadm) from one controller to the other. When a disk fails we have to hot replace it without downtime. So we pull it out, we do an "echo remove-single-scsi-disk ...", then we plug in the new disk and do an 'echo add-...'. The new disk spins up as expected but after some time _all_ disks on that controller aren't working anymore (this results in all RAID's going into degraded mode). To simplify matters and reducing log-output I reproduced this behavior with two disks on either controller. I replaced ( /proc/scsi/scsi is given the following takes place. The lines from 'Dump Card State Begins' to 'Ends' are repeated 4 time: scsi1: ILLEGAL_PHASE 0x80 (scsi1:A:0:0): Abort Message Sent scsi1:0:0:0: Attempting to abort cmd f6c07080: 0x12 0x0 0x0 0x0 0x24 0x0 scsi1: At time of recovery, card was not paused >> Dump Card State Begins < scsi1: Dumping Card State at program address 0x1ae Mode 0x11 Card was paused SWTMINTMASK) SEQINTSTAT[0x0] SAVED_MODE[0x11] DFFSTAT[0x11]:(CURRFIFO_1|FIFO0FREE) SCSISIGI[0x0]:(P_DATAOUT) SCSIPHASE[0x0] SCSIBUS[0x0] LASTPHASE[0xa0]:(P_MESGOUT) SCSISEQ0[0x0] SCSISEQ1[0x12]:(ENAUTOATNP|ENRSELI) SEQCTL0[0x10]:(FASTMODE) SEQINTCTL[0x0] SEQ_FLAGS[0x0] SEQ_FLAGS2[0x0] SSTAT0[0x0] SSTAT1[0x8]:(BUSFREE) SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO) LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] LQOSTAT1[0x0] LQOSTAT2[0x0] SCB Count = 32 CMDS_PENDING = 2 LASTSCB 0x11 CURRSCB 0x11 NEXTSCB 0xff02 qinstart = 52611 qinfifonext = 52612 QINFIFO: 0x1b WAITING_TID_QUEUES: Pending list: 27 FIFO_USE[0x0] SCB_CONTROL[0x68]:(STATUS_RCVD|TAG_ENB|DISCENB) SCB_SCSIID[0x47] 17 FIFO_USE[0x0] SCB_CONTROL[0x40]:(DISCENB) SCB_SCSIID[0x7] Total 2 Kernel Free SCB list: 10 11 6 25 31 18 13 28 22 20 4 8 21 2 26 30 12 23 14 9 24 3 16 5 0 1 7 15 29 19 Sequencer Complete DMA-inprog list: Sequencer Complete list: Sequencer DMA-Up and Complete list: scsi1: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x11 SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT|ENCFG4ISTAT|ENCFG4DATA|ENSAVEPTRS) SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]:(FIFOEMP|HDONE|PRELOAD_AVAIL) SG_CACHE_SHADOW[0x2]:(LAST_SEG) SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x5]:(FIFOFREE|DLZERO) SHADDR = 0x00, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0] scsi1: FIFO1 Active, LONGJMP == 0x8278, SCB 0x11 SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT|ENCFG4ISTAT|ENCFG4DATA|ENSAVEPTRS) SEQINTSRC[0x0] DFCNTRL[0x4]:(DIRECTION) DFSTATUS[0x89]:(FIFOEMP|HDONE|PRELOAD_AVAIL) SG_CACHE_SHADOW[0x3]:(LAST_SEG_DONE|LAST_SEG) SG_STATE[0x0] DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x14]:(DLZERO|LASTSDONE) SHADDR = 0x06, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x10]:(SG_CACHE_AVAIL) LQIN: 0x55 0x3c 0x0 0x11 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42 scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0 SIMODE0[0xc]:(ENOVERRUN|ENIOERR) CCSCBCTL[0x4]:(CCSCBDIR) scsi1: REG0 == 0x60, SINDEX = 0x1ff, DINDEX = 0x102 scsi1: SCBPTR == 0x11, SCB_NEXT == 0xff40, SCB_NEXT2 == 0xfff9 CDB 0 0 0 0 0 0 STACK: 0x125 0x125 0x125 0x125 0x0 0x25f 0x241 0xa7 < Dump Card State Ends >> DevQ(0:0:0): 0 waiting DevQ(0:4:0): 0 waiting DevQ(0:6:0): 0 waiting scsi1:0:4:0: Cmd aborted from QINFIFO Recovery code sleeping Recovery code awake Timer Expired scsi1: Device reset returning 0x2003 Recovery code sleeping Recovery code awake Timer Expired scsi1: Device reset returning 0x2003 Recovery SCB completes last messsage repeated 2 times scsi: Device offlined - not ready a
[PATCH] 2.6 aacraid: adapter naming fix
>From Mark Salyzyn at Adaptec. This fixes the way the aac device's id is calculated. Applies to scsi-misc-2.6 tree. Signed-off-by: Mark Haverkamp <[EMAIL PROTECTED]> = drivers/scsi/aacraid/linit.c 1.46 vs edited = --- 1.46/drivers/scsi/aacraid/linit.c 2005-02-03 18:48:49 -08:00 +++ edited/drivers/scsi/aacraid/linit.c 2005-03-10 10:36:35 -08:00 @@ -573,10 +573,9 @@ int unique_id = 0; list_for_each_entry(aac, &aac_devices, entry) { - if (aac->id > unique_id) { - insert = &aac->entry; + if (aac->id > unique_id) break; - } + insert = &aac->entry; unique_id++; } -- Mark Haverkamp <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aacraid died on kernel 2.4.27
On Thu, Mar 10, 2005 at 09:01:50PM +, Nic Ferrier wrote: > I've been trying, without success, to get the aacraid driver to work > reliably on a Dell 2650 Poweredge. I've had problems there, too. With kernel 2.6.8. I just had another failure, and I got some better logs. Can you install the "afacli" utility (Dell-branded version of aacli), and run diag show history /old from afacli? If you can't seem to get the OLD history to show (compare diag show history /current with diag shot history /old), try diag dump text - you may find some useful messages there. The other, standard, suggestion is to go through and open a ticket with Dell, and do all the hardware diagnostics to try and see if the error can be traced to hardware problems or not. Good luck, I hope you find a fix faster than I have (still trying, unfortunately) -- Ryan Anderson AutoWeb Communications, Inc. email: [EMAIL PROTECTED] signature.asc Description: Digital signature
Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux
On Thu, 2005-03-10 at 11:27 +0100, Lars Marowsky-Bree wrote: > On 2005-03-09T18:36:37, Alex Aizman <[EMAIL PROTECTED]> wrote: > > >That works well in our current development series, and if you want to > > >share code, you can either rip it off (Open Source, we love ya ;) or we > > >can spin off these parts into a sub-package for you to depend on... > > If it's not a big deal :-) let's do the "sub-package" option. > > I've brought this up on the linux-ha-dev list. When do you need this? For open-iscsi, I think it would make sense to link open-iscs daemon code against klibc. The same way dm-multipath do. This will allow as to build iSCSI remote boot using early user-space. Not sure it will be possible to use your package without modifications. Let me know. Dmitry - To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html