Hi all,
we have some problems replacing a SCSI disk in runtime. The problems
started with kernel 2.6.x, with kernels 2.4.x we never saw any problems.
We tried all kernels from 2.6.8 to 2.6.11-rc3-bk3-20050206171922-bigsmp,
the last one we found for SuSE 9.2. All kernels showed this problem.
Our b
the controller where
the disk is repaced are declared to be not ready during the spinup of
the replaced drive. Strange! This means we don't have the possibilty to
hotreplace a drive on Linux which we had since 20 years on HPUX.
As we detected in many, many tries we made the problem see
On 06/05/2013 12:02 PM, Bernd Schubert wrote:
On 06/04/2013 05:39 PM, Joe Lawrence wrote:
Just curious, what type drives were in your RAID and what does
/sys/class/scsi_disk/*/max_write_same_blocks report? If you have a spare
drive to test, maybe you could try a quick sg_write_same command to
#x27; (ctx:WxW)
> #48: FILE: drivers/scsi/sd.h:87:
> + unsignedws10 : 1;
> ^
If someone wants me, I can send another patch to fix the other
lines first.
scsi: Check if the device support WRITE_SAME_10
From: Bernd Schubert
The md layer curre
On 06/05/2013 09:14 PM, Martin K. Petersen wrote:>>>>>> "Bernd" == Bernd
Schubert writes:
>
> Bernd> The md layer currently cannot handle failed WRITE_SAME commands
> Bernd> and the initial easiest fix is to check if the device supports
> B
On 06/07/2013 04:15 AM, Martin K. Petersen wrote:
"Bernd" == Bernd Schubert writes:
max_t(unsigned long, max, SD_MAX_WS10_BLOCKS);
Bernd> Max? Not min_t()?
Brain fart. Updated patch with a few other adjustments.
I have tested this on a couple of JBODs with a mishmash of
uld enable
WRITE SAME and would cause issues with linux-md, but there shouldn't
happen anything directly in the scsi-layer.
Which was your last working kernel version?
Thanks,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to m
On 07/29/2013 03:05 PM, Nix wrote:
On 29 Jul 2013, Bernd Schubert said:
Hi Nick,
On 07/29/2013 12:10 PM, Nick Alcock wrote:
arcmsr0: abort device command of scsi id = 0 lun = 1
arcmsr0: abort device command of scsi id = 0 lun = 0
arcmsr: executing bus reset eh.num_resets=0, num_
On 07/30/2013 01:34 AM, Martin K. Petersen wrote:
"Nix" == Nix writes:
Bernd,
Nix> I can now confirm that reverting this commit causes this problem to
Nix> go away, and my machine boots fine again.
Can you please send me the output of sq_inq with your 1.49 firmware?
I m
ented to allow
Bernd, with a different Areca controller, to boot... obviously, in that
situation, reversion is wrong, since that would just replace one won't-
boot situation with another.
Unless there is very simple fix the commit should reverted, imho. It
would better then to remove write-sa
On 07/31/2013 05:15 AM, Martin K. Petersen wrote:
>>>>>> "Bernd" == Bernd Schubert writes:
>
> Bernd,
>
>>> Product revision level: R001
>
> It's clearly not verbatim passthrough...
>
> Bernd> Besides the firmware, the differ
Once I noticed that scsi_get_vpd_page() works fine from other function
calls and that it is not 0x89, but already 0x0 that fails fixing it became
easy.
Nix, any chance you could verify it also works for you?
From: Bernd Schubert
Somehow older areca firmware versions have issues with
Whoops, the title is wrong, it should have been:
[PATCH] scsi disk: Limit get_vpd_page buf size
On 08/01/2013 04:34 PM, Bernd Schubert wrote:
Once I noticed that scsi_get_vpd_page() works fine from other function
calls and that it is not 0x89, but already 0x0 that fails fixing it became
easy
On 07/30/2013 11:20 PM, Nix wrote:
On 30 Jul 2013, Bernd Schubert told this:
On 07/30/2013 02:56 AM, Nix wrote:
On 30 Jul 2013, Douglas Gilbert outgrape:
Please supply the information that Martin Petersen asked
for.
Did it in private IRC (the advantage of working for the same division of
On 08/01/2013 06:04 PM, Nix wrote:
On 1 Aug 2013, Bernd Schubert verbalised:
On 07/30/2013 11:20 PM, Nix wrote:
On 30 Jul 2013, Bernd Schubert told this:
On 07/30/2013 02:56 AM, Nix wrote:
On 30 Jul 2013, Douglas Gilbert outgrape:
Please supply the information that Martin Petersen asked
Martin,
sorry for my late reply, I entirely lost track of this (customer issues,
vacation, lots of main work, ...).
On 08/02/2013 05:00 AM, Martin K. Petersen wrote:
>>>>>> "Bernd" == Bernd Schubert writes:
>
> Bernd,
>
> Bernd> Once I noticed tha
On 08/31/2013 09:48 PM, Nix wrote:
> On 31 Aug 2013, Greg KH said:
>> On Fri, Aug 30, 2013 at 11:01:56AM +0100, Nix wrote:
>>> On 1 Aug 2013, Bernd Schubert said:
>>>
>>>> Once I noticed that scsi_get_vpd_page() works fine from other function
>>>
ready on
>3.10.7 ... not sure what the rest has because I only run testing.
>
>James
I'm still on my way back and checking commits with my mobile is a bit
difficult. I guess Christan suffers from that heuristics commit and needs the
other patch Martin acked on Friday. I'm going
From: Bernd Schubert
Somehow older areca firmware versions have issues with
scsi_get_vpd_page() and a large buffer, the firmware
seems to crash and the scsi error-handler will start endless
recovery retries.
Limiting the buf-size to 64-bytes fixes this issue with older
firmware versions (<1
es have been added by commit
c213e1407be6b04b144794399a91472e0ef92aec
Cheers,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
rently working on to get
discard working for our LSI2008 HBAs with attached sata-SSDs and the
heuristics in sd_read_write_same with based on VPD page 0x89 is not
correct for this HBA - its SATL supports write-same (although it does
"Logical block address out of range" at the end of the de
On 09/26/2013 07:39 AM, Douglas Gilbert wrote:
On 13-09-25 08:44 PM, Martin K. Petersen wrote:
"Bernd" == Bernd Schubert writes:
Hey Bernd,
Bernd> I'm afraid we have another problem. I'm currently working on to
Bernd> get discard working for our LSI2008 HBAs wit
On 09/26/2013 04:42 PM, Martin K. Petersen wrote:
"Bernd" == Bernd Schubert writes:
Bernd,
Bernd> Both types of systems we have in-house neither block limits vpd
Bernd> nor READ_CAP16 return anything that would indicate discard is
Bernd> supported. But UNMAP and WRITE SAME
ke ocfs2. Ocfs2 is quite easy
>> to set up.
>
> Thank you for the suggestion.
> Unfortunately, due to specific requirements I must use EXT4.
ext4 has MMP support (multiple mount protection), you can enable it with
tune2fs.
Cheers,
Bernd
--
To unsubscribe from this list: send the
. If there are some tests/reboots/whatever
I could do, it would be best to do it shortly after the scheduled reboot.
Actually I now would have attempted to port your mod15 patch
(http://home-tj.org/wiki/index.php/Sil_m15w#Patches) to 2.6.23, hoping it
would solve Soerens problem and ours as we
problems we have with these
Infortrend boxes, in 99.999% of the time the boxes work fine, only
sometimes they suffer from hickup, which usually solves itself after a few
seconds. If we could just suspend i/o for that time, 99% of our problems
would be solved.
Cheers,
Bernd
-
To unsubscribe from this
some
time.
Any hints and suggestions are highly appreciated.
Thanks in advance,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Andrew,
thanks for your help!
On Friday 07 December 2007 02:09:11 Andrew Morton wrote:
> On Wed, 5 Dec 2007 21:44:54 +0100
>
> Bernd Schubert <[EMAIL PROTECTED]> wrote:
> > after scsi-recovery a system here went into some kind lock-up, everything
> > seems t
.
scsi_dispatch_cmd() doesn't seem to be suffient.
I would be greatful for any hints.
Signed-off-by: Bernd Schubert <[EMAIL PROTECTED]>
Index: linux-2.6.22/drivers/scsi/scsi_error.c
===
--- linux-2.6.22.orig/drivers/scs
5:0:2: Ending Domain Validation
Hmm, somehow related to sdev->inquiry_len, but isn't it the task of
spi_schedule_dv_device() and subfunctions to do that properly?
Any comments, hints and help is appreciated.
Signed-of-by: Bernd Schubert <[EMAIL PROTECTED]>
Index: linux-2.6.2
On Wednesday 12 December 2007 14:39:27 Matthew Wilcox wrote:
> On Wed, Dec 12, 2007 at 01:54:14PM +0100, Bernd Schubert wrote:
> > below is a patch introducing device recovery, trying to prevent i/o
> > errors when a DID_NO_CONNECT or SOFT_ERROR does happen.
>
> Why doesn
[Hmm, resending since mail after more than 30min still not on the ML, maybe
the attachment was too large? I have uploaded the log to
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/scsi/kern.log.1]
On Wednesday 12 December 2007 16:59:36 James Bottomley wrote:
> On Wed, 2007-12-12 at
On Thursday 13 December 2007 15:18:33 James Bottomley wrote:
> On Wed, 2007-12-12 at 18:54 +0100, Bernd Schubert wrote:
> > [Hmm, resending since mail after more than 30min still not on the ML,
> > maybe the attachment was too large? I have uploaded the log to
> > http://www.
Hello James,
On Thursday 13 December 2007 15:18:33 James Bottomley wrote:
> On Wed, 2007-12-12 at 18:54 +0100, Bernd Schubert wrote:
> > [Hmm, resending since mail after more than 30min still not on the ML,
> > maybe the attachment was too large? I have uploaded the log to
> &g
On Friday 14 December 2007 13:22:55 Matthew Wilcox wrote:
> On Fri, Dec 14, 2007 at 01:04:12PM +0100, Bernd Schubert wrote:
> > PS: Do you have some links to scsi and SPI specs?
>
> The final versions are available for a fee from ANSI. However,
> you can download the final draft
in domain validation because of
> > > a resource starvation issue, but I know of none where everything hangs
> > > just after error recovery completes.
> >
> > Since still not much happend to solve this bug, shall I create a bugzilla
> > entry?
>
> Sure .
2349.908019] [] system_call+0x7e/0x83
[ 2349.913326] [<2b351a9c92aa>]
[ 2349.916713]
Thanks,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info
e_t last_recovery; /* last time eh completed */
+ int n_errors; /* number failures within
+time limit */
wait_queue_head_t host_wait;
struct scsi_host_template *hostt;
struct
s is really troublesome in
production situation. Some hints for further debugging should be suffienct
for now.
Thanks,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTEC
logging_level", but this doesn't
reveal anything.
Please, we really need to fix this, as this is really troublesome in
production situation. Some hints for further debugging should be suffienct
for now.
Thanks,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from t
On Wednesday 02 January 2008 20:07:51 Moore, Eric wrote:
> On Wednesday, January 02, 2008 11:54 AM, Bernd Schubert wrote:
> > I complained about this before, but always got ignored.
> > Please not this time.
>
> Sorry, I didn't see your email before today.
>
> &
error handling patches in queue for 2.6.22, I would like to
know if I would have catched this error, but 0x0007 is pretty meaningless
for me :(
Thanks,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the bo
On Wednesday 16 January 2008 19:27:43 James Bottomley wrote:
> On Wed, 2008-01-16 at 19:13 +0100, Bernd Schubert wrote:
> > Hi,
> >
> > I already grepped, but I don't find the definition of
> >
> > return code = 0x0007
> >
> >
> > Just g
t for v3.12, but I wanted to
get your feedback first.
James queued this up for 3.13
http://git.kernel.org/cgit/linux/kernel/git/jejb/scsi.git/commit/?id=735e39e680256a13e7be3492acfb4d9721287a42
Maybe we should try to convince James to take it into 3.12?
Cheers,
Bernd
--
To unsubscribe from
disk will suffer from data corruption,
but the data on the older ST3200822AS will *not*.
kernel versions tested: 2.6.15-2.6.20
Any ideas how to proceed?
Thanks,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the
: p1 p2 p3
[ 345.187133] sd 1:0:0:0: Attached scsi disk sdb
The ST3250820AS data on the ST3250820AS disk will suffer from data corruption,
but the data on the older ST3200822AS will *not*.
kernel versions tested: 2.6.15-2.6.20
Any ideas how to proceed?
Thanks,
Bernd
--
Bernd Schubert
Q-Leap
On Monday 08 October 2007 17:09:17 Bernd Schubert wrote:
> [sorry for sending twice, but after I read the sil sources, I see the mail
> address had been wrong]
>
> Hi,
>
> somehow the sil3114 causes data corruption with some (newer?) disks. Simply
> filling the filesystem w
On Wednesday 10 October 2007 11:12:20 Bernd Schubert wrote:
> On Monday 08 October 2007 17:09:17 Bernd Schubert wrote:
> > [sorry for sending twice, but after I read the sil sources, I see the
> > mail address had been wrong]
> >
> > Hi,
> >
> > somehow the s
This will add the Seagate ST3250820AS to the mod15 blacklist.
I think this is rather trivial and should go into any any release as soon as
possible, since there will be data corruption without it for this disk.
Signed-off-by: Bernd Schubert <[EMAIL PROTECTED]>
Index: linux-2.6.23-rc9/d
/s to 20-25MB/s. But better safe than
lost data or damaged filesystem.
Signed-off-by: Bernd Schubert <[EMAIL PROTECTED]>
Index: linux-2.6.23-rc9/drivers/ata/sata_sil.c
===
--- linux-2.6.23-rc9.orig/drivers/ata/sata_sil.c
| 58 ++--
include/linux/libata.h|6 +++
3 files changed, 62 insertions(+), 11 deletions(-)
Signed-off-by: Bernd Schubert <[EMAIL PROTECTED]>
Index: linux-2.6.23-rc9/drivers/ata/libata-
a
corruption as without the patch. I know this is only an obersavation and no
definite prove...
Also, this is with 3114, maybe this chip behaves a bit different than 3112?
Thanks,
Bernd
--
Bernd Schubert
Q-Leap Networks GmbH
-
To unsubscribe from this list: send the line "unsubscribe linux-scs
On Thursday 11 October 2007 17:04:45 Jeff Garzik wrote:
> Bernd Schubert wrote:
> > On Thursday 11 October 2007 16:19:37 Jeff Garzik wrote:
> >> 1) Just about the only valid optimization is to ensure that only the
> >> write path must be limited to small chunks, not both
On Friday 12 October 2007 23:08:21 Jeff Garzik wrote:
> Bernd Schubert wrote:
> > a) 2.6.23 + sil-patch I posted, this is on a customer system (though my
> > former group), I wouldn't like to use -mm there.
> >
> > b) .config is attached
> >
> > c) attac
u8 scsi_io_cb_idx;
diff --git a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
index 7f0af4f..d502728 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_scsih.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_scsih.c
@@ -127,6 +127,11 @@ static int disable_discovery =
-sas in general? The LSI support link sent by Kurt [1] is
also not perfect with respect to trim, as it lists Intel510, although we
definitely know that Intel510s are disabled by default and forcing
write-same trim causes data corruption (unmap works, though).
Thanks in advance,
Bernd
[1]
htt
.
I already wanted to report the same issue, as a flaky cable caused
libata error handling on one of my systems at home. ATA EH succeeded for
several weeks until several file systems on that system reported
corruption (btrfs and ext4). Failed commands I can see from syslog are
"READ FPDMA QUEUED" and "FLUSH CACHE EXT", but I'm not sure if it is
complete, as the log file is on btrfs and it reports checksum mismatch
for that file. Kernel version is 4.4.0-81-ubuntu, I have not checked yet
if they applied any libata patches.
Thanks,
Bernd
Hello Tejun,
On 08/17/2017 02:48 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Aug 17, 2017 at 11:24:22AM +0200, Bernd Schubert wrote:
>>> More concerning is the fact that these undetected errors can make their
>>> way even when the higher application consistently calls
On 08/17/2017 03:25 PM, Tejun Heo wrote:
> Hello,
>
> On Thu, Aug 17, 2017 at 03:18:06PM +0200, Bernd Schubert wrote:
>> So for Gionatan the root cause was an instable power supply, but in my
>> case there wasn't any power loss, there were just failed sata commands.
&
Even Nagios complains
> about the machine being down while rsync is running.
do you have the write-back cache of the controller enabled for your disks?
When you disable this cache, the controller will also disable the disks,
cause a write-performance between 3 to 8MB/s per disks.
Cheer
r sending
that command. Unmap worked fine, though. So possibly there is another
blacklisting required.
Cheers,
Bernd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.k
going to compile
> > when scsi_device_online is already implemented in the kernel tree.
> > The routine scsi_device_online is a function, not a define. For a define
> > this would work.
>
> Sure it does, function names are defined symbols.
Defined for the preprocessor or t
e fix.
Furthermore, the drbd module is loaded. You may find a dmesg, lsmod and lspci
information and the kernel config here:
http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/aic79xx-oops/
Ooops:
(none) login: ACPI: PCI interrupt :02:06.0[A] -> GSI 24 (level, low) ->
IRQ 24
ACPI: PCI int
ee a final solution...
We also want to buy the very same raid device and also connect it to an
already existing aic79xx controller.
Thanks in advance,
Bernd
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More maj
r
> responding again.
For a non-scsi protocol expert those scsi card dumps really don't say
anything :(
Thanks again,
Bernd
--
Bernd Schubert
Physikalisch Chemisches Institut / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
-
To unsubscribe from this list: send the line
SI error: return code = 0xb
[17179939.464000] end_request: I/O error, dev sda, sector 12691101688
[17180107.112000] sd 0:0:1:0: SCSI error: return code = 0xb
[17180107.116000] end_request: I/O error, dev sda, sector 12691101688
Any help is appriciated.
Thanks in advance,
Bernd
-
To unsub
f firmware related conditions or
> transfer underruns.
>
> I've cc'd the fusion people to see if they can help you diagnose it
> further.
>
> James
James, many thanks for your help. I will try to do a firmware update on
Monday, maybe that helps. Of course, we also apprecia
13:
../linux_compat.h:9:30: error: scsi/scsi_device.h: No such file or directory
../linux_compat.h:10:28: error: scsi/scsi_cmnd.h: No such file or directory
The kernel_headers in Debian don't have these files and including the header
files directly from the kernel tree doesn't work.
Do y
68 matches
Mail list logo