Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

2005-03-10 Thread Lars Marowsky-Bree
On 2005-03-09T18:36:37, Alex Aizman <[EMAIL PROTECTED]> wrote:

> Heartbeat is good for reliability, etc. WRT "getting paged-out" - 
> non-deterministic (things depend on time), right?

Right, if we didn't get scheduled often enough for us to send our
heartbeat messages to the other peers, they'll evict us from the cluster
and fence us, causing a service disruption.

With all these protections in place though, we can run at roughly 50ms
heartbeat intervals from user-space, reliably, which allows us a node
dead timer of ~200ms. I think that's pretty damn good.

(Of course, realistically, even for subsecond fail-over, 200ms keep
alives are sufficient, and 50ms would be quite extreme. But, it works.)

> >That works well in our current development series, and if you want to
> >share code, you can either rip it off (Open Source, we love ya ;) or we
> >can spin off these parts into a sub-package for you to depend on...
> If it's not a big deal :-) let's do the "sub-package" option.

I've brought this up on the linux-ha-dev list. When do you need this?


Sincerely,
Lars Marowsky-Brée <[EMAIL PROTECTED]>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


aacraid died on kernel 2.4.27

2005-03-10 Thread Nic Ferrier
I've been trying, without success, to get the aacraid driver to work
reliably on a Dell 2650 Poweredge.

I'm using Debian so I tried kernel 2.6.8 first (Debian has it packaged
nicely but from a SCSI driver point of view it's just a standard
kernel.org source).

2.6.8 worked... but died as soon as we put the server under load.

2.6.11 also died under load.

Yesterday I put 2.4.27 on the box because I had understood that the
aacraid is stable in that release of the kernel. But a few hours ago
the box died in exactly the same way (it was under quite heavy load).

Unfortunately, I can't give you error messages because I don't have
any log from the failures. It comes out on the console and I don't
have it saved anywhere. But it definitely is the raid controller.


Maybe someone can answer the following for me:

- is the driver understood to be stable in 2.4.27?

- is there another driver I could try (would the pre-Cox one work?)

- is there anything I can do to alleviate the problem?




Nic Ferrier
http://www.tapsellferrier.co.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ADAPTEC Ultra320 hotplugging with 2.6.x

2005-03-10 Thread bernd
Hi all,

we have some problems replacing a SCSI disk in runtime. The problems
started with kernel 2.6.x, with kernels 2.4.x we never saw any problems.
We tried all kernels from 2.6.8 to 2.6.11-rc3-bk3-20050206171922-bigsmp,
the last one we found for SuSE 9.2. All kernels showed this problem.

Our boxes have 2 controllers, here the shortened info out of boot.msg 
(for one controller only, the other is similar):

<6>scsi1: Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 1.3.11
<4>   
<4>   aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz,512 SCBs
<4>
<4>(scsi1:A:0): 320.000MB/s transfers (160.000MHz DT|IU|QAS, 16bit)
<5>  Vendor: MAXTORModel: ATLAS10K5_147SCA  Rev: JNZ3
<5>  Type:   Direct-Access  ANSI SCSI revision: 03
<4>scsi1:A:0:0: Tagged Queuing enabled.  Depth 32
<5>SCSI device sda: 287332384 512-byte hdwr sectors (147114 MB)
<5>SCSI device sda: drive cache: write back
<5>SCSI device sda: 287332384 512-byte hdwr sectors (147114 MB)
<5>SCSI device sda: drive cache: write back
<6> sda: sda1 sda2
<5>Attached scsi disk sda at scsi1, channel 0, id 0, lun 0
<5>  Vendor: ESG-SHV   Model: SCA HSBP M15  Rev: 0.11
<5>  Type:   Processor  ANSI SCSI revision: 02

Each controller is responsible for 5 SCA disks. The disks are mirrored in
a software RAID1 (mdadm) from one controller to the other. When a disk 
fails we have to hot replace it without downtime. So we pull it out, we
do an "echo remove-single-scsi-disk ...", then we plug in the new disk and
do an 'echo add-...'. The new disk spins up as expected but after some
time _all_ disks on that controller aren't working anymore (this results
in all RAID's going into degraded mode).

To simplify matters and reducing log-output I reproduced this behavior
with two disks on either controller. I replaced ( /proc/scsi/scsi 
is given the following takes place. The lines from 'Dump Card State Begins'
to 'Ends' are repeated 4 time:

scsi1: ILLEGAL_PHASE 0x80
(scsi1:A:0:0): Abort Message Sent
scsi1:0:0:0: Attempting to abort cmd f6c07080: 0x12 0x0 0x0 0x0 0x24 0x0
scsi1: At time of recovery, card was not paused
>> Dump Card State Begins <
scsi1: Dumping Card State at program address 0x1ae Mode 0x11
Card was paused
SWTMINTMASK) SEQINTSTAT[0x0] 
SAVED_MODE[0x11] DFFSTAT[0x11]:(CURRFIFO_1|FIFO0FREE) 
SCSISIGI[0x0]:(P_DATAOUT) SCSIPHASE[0x0] SCSIBUS[0x0] 
LASTPHASE[0xa0]:(P_MESGOUT) SCSISEQ0[0x0] SCSISEQ1[0x12]:(ENAUTOATNP|ENRSELI) 
SEQCTL0[0x10]:(FASTMODE) SEQINTCTL[0x0] SEQ_FLAGS[0x0] 
SEQ_FLAGS2[0x0] SSTAT0[0x0] SSTAT1[0x8]:(BUSFREE) 
SSTAT2[0x0] SSTAT3[0x0] PERRDIAG[0x0] 
SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO) 
LQISTAT0[0x0] LQISTAT1[0x0] LQISTAT2[0x0] LQOSTAT0[0x0] 
LQOSTAT1[0x0] LQOSTAT2[0x0] 

SCB Count = 32 CMDS_PENDING = 2 LASTSCB 0x11 CURRSCB 0x11 NEXTSCB 0xff02
qinstart = 52611 qinfifonext = 52612
QINFIFO: 0x1b
WAITING_TID_QUEUES:
Pending list:
 27 FIFO_USE[0x0] SCB_CONTROL[0x68]:(STATUS_RCVD|TAG_ENB|DISCENB) 
SCB_SCSIID[0x47] 
 17 FIFO_USE[0x0] SCB_CONTROL[0x40]:(DISCENB) SCB_SCSIID[0x7] 
Total 2
Kernel Free SCB list: 10 11 6 25 31 18 13 28 22 20 4 8 21 2 26 30 12 23 14 9 24 
3 16 5 0 1 7 15 29 19 
Sequencer Complete DMA-inprog list: 
Sequencer Complete list: 
Sequencer DMA-Up and Complete list: 

scsi1: FIFO0 Free, LONGJMP == 0x80ff, SCB 0x11
SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT|ENCFG4ISTAT|ENCFG4DATA|ENSAVEPTRS)
 
SEQINTSRC[0x0] DFCNTRL[0x0] DFSTATUS[0x89]:(FIFOEMP|HDONE|PRELOAD_AVAIL) 
SG_CACHE_SHADOW[0x2]:(LAST_SEG) SG_STATE[0x0] DFFSXFRCTL[0x0] 
SOFFCNT[0x0] MDFFSTAT[0x5]:(FIFOFREE|DLZERO) SHADDR = 0x00, SHCNT = 0x0 
HADDR = 0x00, HCNT = 0x0 CCSGCTL[0x0] 
scsi1: FIFO1 Active, LONGJMP == 0x8278, SCB 0x11
SEQIMODE[0x3f]:(ENCFG4TCMD|ENCFG4ICMD|ENCFG4TSTAT|ENCFG4ISTAT|ENCFG4DATA|ENSAVEPTRS)
 
SEQINTSRC[0x0] DFCNTRL[0x4]:(DIRECTION) 
DFSTATUS[0x89]:(FIFOEMP|HDONE|PRELOAD_AVAIL) 
SG_CACHE_SHADOW[0x3]:(LAST_SEG_DONE|LAST_SEG) SG_STATE[0x0] 
DFFSXFRCTL[0x0] SOFFCNT[0x0] MDFFSTAT[0x14]:(DLZERO|LASTSDONE) 
SHADDR = 0x06, SHCNT = 0x0 HADDR = 0x00, HCNT = 0x0 
CCSGCTL[0x10]:(SG_CACHE_AVAIL) 
LQIN: 0x55 0x3c 0x0 0x11 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
0x0 0x0 0x0 
scsi1: LQISTATE = 0x0, LQOSTATE = 0x0, OPTIONMODE = 0x42
scsi1: OS_SPACE_CNT = 0x20 MAXCMDCNT = 0x0
SIMODE0[0xc]:(ENOVERRUN|ENIOERR) 
CCSCBCTL[0x4]:(CCSCBDIR) 
scsi1: REG0 == 0x60, SINDEX = 0x1ff, DINDEX = 0x102
scsi1: SCBPTR == 0x11, SCB_NEXT == 0xff40, SCB_NEXT2 == 0xfff9
CDB 0 0 0 0 0 0
STACK: 0x125 0x125 0x125 0x125 0x0 0x25f 0x241 0xa7
< Dump Card State Ends >>
DevQ(0:0:0): 0 waiting
DevQ(0:4:0): 0 waiting
DevQ(0:6:0): 0 waiting
scsi1:0:4:0: Cmd aborted from QINFIFO
Recovery code sleeping
Recovery code awake
Timer Expired
scsi1: Device reset returning 0x2003
Recovery code sleeping
Recovery code awake
Timer Expired
scsi1: Device reset returning 0x2003
Recovery SCB completes
last messsage repeated 2 times
scsi: Device offlined - not ready a

[PATCH] 2.6 aacraid: adapter naming fix

2005-03-10 Thread Mark Haverkamp
>From Mark Salyzyn at Adaptec.  

This fixes the way the aac device's id is calculated.
Applies to scsi-misc-2.6 tree.

Signed-off-by: Mark Haverkamp <[EMAIL PROTECTED]>

= drivers/scsi/aacraid/linit.c 1.46 vs edited =
--- 1.46/drivers/scsi/aacraid/linit.c   2005-02-03 18:48:49 -08:00
+++ edited/drivers/scsi/aacraid/linit.c 2005-03-10 10:36:35 -08:00
@@ -573,10 +573,9 @@
int unique_id = 0;
 
list_for_each_entry(aac, &aac_devices, entry) {
-   if (aac->id > unique_id) {
-   insert = &aac->entry;
+   if (aac->id > unique_id)
break;
-   }
+   insert = &aac->entry;
unique_id++;
}
 

-- 
Mark Haverkamp <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aacraid died on kernel 2.4.27

2005-03-10 Thread Ryan Anderson
On Thu, Mar 10, 2005 at 09:01:50PM +, Nic Ferrier wrote:
> I've been trying, without success, to get the aacraid driver to work
> reliably on a Dell 2650 Poweredge.

I've had problems there, too.  With kernel 2.6.8.

I just had another failure, and I got some better logs.

Can you install the "afacli" utility (Dell-branded version of aacli),
and run
diag show history /old
from afacli?

If you can't seem to get the OLD history to show (compare diag show
history /current with diag shot history /old), try diag dump text - you
may find some useful messages there.

The other, standard, suggestion is to go through and open a ticket with
Dell, and do all the hardware diagnostics to try and see if the error
can be traced to hardware problems or not.

Good luck, I hope you find a fix faster than I have (still trying,
unfortunately)

-- 

Ryan Anderson
AutoWeb Communications, Inc. 
email: [EMAIL PROTECTED] 



signature.asc
Description: Digital signature


Re: [ANNOUNCE 0/6] Open-iSCSI High-Performance Initiator for Linux

2005-03-10 Thread Dmitry Yusupov
On Thu, 2005-03-10 at 11:27 +0100, Lars Marowsky-Bree wrote:
> On 2005-03-09T18:36:37, Alex Aizman <[EMAIL PROTECTED]> wrote:
> > >That works well in our current development series, and if you want to
> > >share code, you can either rip it off (Open Source, we love ya ;) or we
> > >can spin off these parts into a sub-package for you to depend on...
> > If it's not a big deal :-) let's do the "sub-package" option.
> 
> I've brought this up on the linux-ha-dev list. When do you need this?

For open-iscsi, I think it would make sense to link open-iscs daemon
code against klibc. The same way dm-multipath do. This will allow as to
build iSCSI remote boot using early user-space. Not sure it will be
possible to use your package without modifications. Let me know.

Dmitry

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html