Thanks to everyone who made suggestions! This machine has run
memtest for a week and VTS for several days with no errors. It
does seem that the problem is probably in the CPU cache.
On 03/24/10 10:07 AM, Damon Atkins wrote:
You could try copying the file to /tmp (ie swap/ram) and do a
continues
On Mar 23, 2010, at 11:21 PM, Daniel Carosone wrote:
> On Tue, Mar 23, 2010 at 07:22:59PM -0400, Frank Middleton wrote:
>> On 03/22/10 11:50 PM, Richard Elling wrote:
>>
>>> Look again, the checksums are different.
>>
>> Whoops, you are correct, as usual. Just 6 bits out of 256 different...
>>
you could also use psradm to take a CPU off-line.
At boot I would ??assume?? the system boots the same way every time unless
something changes, so you could be hiting the came CPU core every time or the
same bit of RAM until booted fully.
Or even run SunVTS "Validation Test Suite" which I beliv
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
How about running memtest86+ (http://www.memtest.org/) on the machine
for a while? It doesn't test the arithmetics on the CPU very much, but
it stresses data paths quite a lot. Just a quick suggestion...
- --
Saso
Damon Atkins wrote:
> You could try
You could try copying the file to /tmp (ie swap/ram) and do a continues loop of
checksums e.g.
while [ ! -f ibdlpi.so.1.x ] ; do sleep 1; cp libdlpi.so.1 libdlpi.so.1.x ;
A="`sha512sum -b libdlpi.so.1.x`" ; [ "$A" == "what it should be
libdlpi.so.1.x" ] && rm libdlpi.so.1.x ; done ; date
Ass
On Tue, Mar 23, 2010 at 07:22:59PM -0400, Frank Middleton wrote:
> On 03/22/10 11:50 PM, Richard Elling wrote:
>
>> Look again, the checksums are different.
>
> Whoops, you are correct, as usual. Just 6 bits out of 256 different...
>
> Look which bits are different - digits 24, 53-56 in both cas
On 03/22/10 11:50 PM, Richard Elling wrote:
Look again, the checksums are different.
Whoops, you are correct, as usual. Just 6 bits out of 256 different...
Last year
expected 4a027c11b3ba4cec bf274565d5615b7b 3ef5fe61b2ed672e ec8692f7fd33094a
actual 4a027c11b3ba4cec bf274567d5615b7b 3ef5
On Mar 22, 2010, at 4:21 PM, Frank Middleton wrote:
> On 03/21/10 03:24 PM, Richard Elling wrote:
>
>> I feel confident we are not seeing a b0rken drive here. But something is
>> clearly amiss and we cannot rule out the processor, memory, or controller.
>
> Absolutely no question of that, other
On 03/21/10 03:24 PM, Richard Elling wrote:
I feel confident we are not seeing a b0rken drive here. But something is
clearly amiss and we cannot rule out the processor, memory, or controller.
Absolutely no question of that, otherwise this list would be flooded :-).
However, the purpose of th
On Mar 21, 2010, at 11:03 AM, Frank Middleton wrote:
> On 03/15/10 01:01 PM, David Dyer-Bennet wrote:
>
>> This sounds really bizarre.
>
> Yes, it is. ButCR 6880994 is bizarre too.
Rolling back to a conversation with Frank last fall, here is the output
of fmdump which shows the single bit flip.
On 03/15/10 01:01 PM, David Dyer-Bennet wrote:
This sounds really bizarre.
Yes, it is. ButCR 6880994 is bizarre too.
One detail suggestion on checking what's going on (since I don't have a
clue towards a real root-cause determination): Get an md5sum on a clean
copy of the file, say from a n
On Sun, March 14, 2010 13:54, Frank Middleton wrote:
>
> How can it even be remotely possible to get a checksum failure on mirrored
> drives
> with copies=2? That means all four copies were corrupted? Admittedly this
> is
> on a grotty PC with no ECC and flaky bus parity, but how come the same
>
Can anyone say what the status of CR 6880994 (kernel/zfs Checksum failures on
mirrored drives) might be?
Setting copies=2 has mitigated the problem, which manifests itself consistently
at
boot by flagging libdlpi.so.1, but two recent power cycles in a row with no
normal
shutdown has resulted in
13 matches
Mail list logo