Victor, thanks for posting that. It really is interesting to see exactly what
happened, and to read about how zfs pools can be recovered.
Your work on these forums has done much to re-assure me that ZFS is stable
enough for us to be using on a live server, and I look forward to seeing
automate
Borys Saulyak wrote:
>> As a follow up to the whole story, with the fantastic help of
>> Victor, the failed pool is now imported and functional thanks to
>> the redundancy in the meta data.
> It would be really useful if you could publish the steps to recover
> the pools.
Here it is:
Executive s
> As a follow up to the whole story, with the fantastic
> help of Victor,
> the failed pool is now imported and functional thanks
> to the redundancy
> in the meta data.
It would be really useful if you could publish the steps to recover the pools.
This message posted from opensolaris.org
_
Victor Latushkin wrote:
> Hi Tom and all,
>> [EMAIL PROTECTED]:~# uname -a
>> SunOS cs3.kw 5.10 Generic_127127-11 sun4v sparc SUNW,Sun-Fire-T200
>
> Btw, have you considered opening support call for this issue?
As a follow up to the whole story, with the fantastic help of Victor,
the failed pool
Hi Robert, et.al.,
I have blogged about a method I used to recover a removed file from a
zfs file system
at http://mbruning.blogspot.com.
Be forewarned, it is very long...
All comments are welcome.
max
Robert Milkowski wrote:
> Hello max,
>
> Sunday, August 17, 2008, 1:02:05 PM, you wrote:
>
> m
Hello max,
Sunday, August 17, 2008, 1:02:05 PM, you wrote:
mbc> A Darren Dunham wrote:
>>
>> If the most recent uberblock appears valid, but doesn't have useful
>> data, I don't think there's any way currently to see what the tree of an
>> older uberblock looks like. It would be nice to see if t
A Darren Dunham wrote:
>
> If the most recent uberblock appears valid, but doesn't have useful
> data, I don't think there's any way currently to see what the tree of an
> older uberblock looks like. It would be nice to see if that data
> appears valid and try to create a view that would be
> read
Miles Nordin <[EMAIL PROTECTED]>
> "cs" == Cromar Scott <[EMAIL PROTECTED]> writes:
cs> We opened a call with Sun support. We were told that the
cs> corruption issue was due to a race condition within ZFS. We
cs> were also told that the issue was known and was scheduled for
c
> "cs" == Cromar Scott <[EMAIL PROTECTED]> writes:
cs> We opened a call with Sun support. We were told that the
cs> corruption issue was due to a race condition within ZFS. We
cs> were also told that the issue was known and was scheduled for
cs> a fix in S10U6.
nice. Is the
Miles Nordin <[EMAIL PROTECTED]>
> "cs" == Cromar Scott <[EMAIL PROTECTED]> writes:
cs> It appears that the metadata on that pool became corrupted
cs> when the processor failed. The exact mechanism is a bit of a
cs> mystery,
[...]
cs> We were told that the probability of me
> "cs" == Cromar Scott <[EMAIL PROTECTED]> writes:
cs> It appears that the metadata on that pool became corrupted
cs> when the processor failed. The exact mechanism is a bit of a
cs> mystery,
[...]
cs> We were told that the probability of metadata corruption would
cs> ha
Richard Elling <[EMAIL PROTECTED]>
Cromar Scott wrote:
> Chris Siebenmann <[EMAIL PROTECTED]>
>
> I'm not Anton Rang, but:
> | How would you describe the difference between the data recovery
> | utility and ZFS's normal data recovery process?
>
> cks> The data recovery utility should not panic
>
Cromar Scott wrote:
> Chris Siebenmann <[EMAIL PROTECTED]>
>
> I'm not Anton Rang, but:
> | How would you describe the difference between the data recovery
> | utility and ZFS's normal data recovery process?
>
> cks> The data recovery utility should not panic
> cks> my entire system if it runs in
On Aug 7, 2008, at 10:25 PM, Anton B. Rang wrote:
>> How would you describe the difference between the file system
>> checking utility and zpool scrub? Is zpool scrub lacking in its
>> verification of the data?
>
> To answer the second question first, yes, zpool scrub is lacking, at
> least to
Chris Siebenmann <[EMAIL PROTECTED]>
I'm not Anton Rang, but:
| How would you describe the difference between the data recovery
| utility and ZFS's normal data recovery process?
cks> The data recovery utility should not panic
cks> my entire system if it runs into some situation
cks> that it ut
From: Richard Elling <[EMAIL PROTECTED]>
Miles Nordin wrote:
>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>> "tb" == Tom Bird <[EMAIL PROTECTED]> writes:
>>
>
...
>
> re> In general, ZFS can only repair conditions for which it owns
> re> data redundancy.
tb
Claus Guttesen wrote:
>> | How would you describe the difference between the data recovery
>> | utility and ZFS's normal data recovery process?
>>
>> The data recovery utility should not panic my entire system if it runs
>> into some situation that it utterly cannot handle. Solaris 10 U5 kernel
>>
> | How would you describe the difference between the data recovery
> | utility and ZFS's normal data recovery process?
>
> The data recovery utility should not panic my entire system if it runs
> into some situation that it utterly cannot handle. Solaris 10 U5 kernel
> ZFS code does not have this
I'm not Anton Rang, but:
| How would you describe the difference between the data recovery
| utility and ZFS's normal data recovery process?
The data recovery utility should not panic my entire system if it runs
into some situation that it utterly cannot handle. Solaris 10 U5 kernel
ZFS code doe
Victor Latushkin wrote:
> Hi Tom and all,
>
> Tom Bird wrote:
>> Hi,
>>
>> Have a problem with a ZFS on a single device, this device is 48 1T SATA
>> drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had
>> a ZFS on it as a single device.
>>
>> There was a problem with the SAS b
> How would you describe the difference between the file system
> checking utility and zpool scrub? Is zpool scrub lacking in its
> verification of the data?
To answer the second question first, yes, zpool scrub is lacking, at least to
the best of my knowledge (I haven't looked at the ZFS source
Miles Nordin пишет:
>> "r" == Ross <[EMAIL PROTECTED]> writes:
>
> r> Tom wrote "There was a problem with the SAS bus which caused
> r> various errors including the inevitable kernel panic". It's
> r> the various errors part that catches my eye,
>
> yeah, possibly, but there
On Thu, Aug 07, 2008 at 11:34:12AM -0700, Richard Elling wrote:
> Anton B. Rang wrote:
> > First, there are two types of utilities which might be useful in the
> > situation where a ZFS pool has become corrupted. The first is a file system
> > checking utility (call it zfsck); the second is a dat
[I think Miles and I seem to be talking about two different topics]
Miles Nordin wrote:
>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>>
>
> re> If your pool is not redundant, the chance that data
> re> corruption can render some or all of your data inacces
On Thu, 2008-08-07 at 11:34 -0700, Richard Elling wrote:
> How would you describe the difference between the data recovery
> utility and ZFS's normal data recovery process?
I'm not Anton but I think I see what he's getting at.
Assume you have disks which once contained a pool but all of the
uberb
On Thu, 7 Aug 2008, Miles Nordin wrote:
I must apologize that I was not able to read your complete email due
to local buffer overflow ...
> someone who knows ZFS well like Pavel. Also, there is enough concern
> for people designing paranoid systems to approach them with the view,
> ``ZFS is not
Anton B. Rang wrote:
>> From the ZFS Administration Guide, Chapter 11, Data Repair section:
>> Given that the fsck utility is designed to repair known pathologies
>> specific to individual file systems, writing such a utility for a file
>> system with no known pathologies is impossible.
>>
>
>
> "r" == Ross <[EMAIL PROTECTED]> writes:
r> Tom wrote "There was a problem with the SAS bus which caused
r> various errors including the inevitable kernel panic". It's
r> the various errors part that catches my eye,
yeah, possibly, but there are checksums on the SAS bus, and
Hi Richard,
Yes, sure. We can add that scenario.
What's been on my todo list is a ZFS troubleshooting wiki.
I've been collecting issues. Let's talk soon.
Cindy
Richard Elling wrote:
> Tom Bird wrote:
>
>> Richard Elling wrote:
>>
>>
>>
>>> I see no evidence that the data is or is not correct
Anton B. Rang writes:
> dumping out the raw data structures and looking at
> them by hand is the only way to determine what
> ZFS doesn't like and deduce what went wrong (and
> how to fix it).
http://www.osdevcon.org/2008/files/osdevcon2008-max.pdf
:-)
--
Miles Nordin wrote:
>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>> "tb" == Tom Bird <[EMAIL PROTECTED]> writes:
>
> tb> There was a problem with the SAS bus which caused various
> tb> errors including the inevitable kernel panic, the thing came
> tb> back up with 3 ou
Hi folks,
Miles, I don't know if you have more information about this problem than I'm
seeing, but from what Tom wrote I don't see how you can assume this is such a
simple problem as an unclean shutdown?
Tom wrote "There was a problem with the SAS bus which caused various errors
including the i
> Would be grateful for any ideas, relevant output here:
>
> [EMAIL PROTECTED]:~# zpool import
> pool: content
> id: 14205780542041739352
> state: FAULTED
> status: The pool metadata is corrupted.
> action: The pool cannot be imported due to damaged devices or data.
> The pool may b
Hi Tom and all,
Tom Bird wrote:
Hi,
Have a problem with a ZFS on a single device, this device is 48 1T SATA
drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had
a ZFS on it as a single device.
There was a problem with the SAS bus which caused various errors
including the in
> "nw" == Nicolas Williams <[EMAIL PROTECTED]> writes:
nw> Without ZFS the OP would have had silent, undetected (by the
nw> OS that is) data corruption.
It sounds to me more like the system would have paniced as soon as he
pulled the cord, and when it rebooted, it would have rolled t
> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
re> If your pool is not redundant, the chance that data
re> corruption can render some or all of your data inaccessible is
re> always present.
1. data corruption != unclean shutdown
2. other filesystems do not need a mirror
On Wed, Aug 06, 2008 at 03:44:08PM -0400, Miles Nordin wrote:
> > "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>
> c> If that's really the excuse for this situation, then ZFS is
> c> not ``always consistent on the disk'' for single-VDEV pools.
>
> re> I disagree with your
On Wed, Aug 06, 2008 at 02:23:44PM -0400, Will Murnane wrote:
> On Wed, Aug 6, 2008 at 13:57, Miles Nordin <[EMAIL PROTECTED]> wrote:
> > If that's really the excuse for this situation, then ZFS is not
> > ``always consistent on the disk'' for single-VDEV pools.
> Well, yes. If data is sent, but c
> As others have explained, if ZFS does not have a
> config with data redundancy - there is not much that
> can be learned - except that it "just broke".
Plenty can be learned by just looking at the pool.
Unfortunately ZFS currently doesn't have tools which
make that easy; as I understand it, zdb
> From the ZFS Administration Guide, Chapter 11, Data Repair section:
> Given that the fsck utility is designed to repair known pathologies
> specific to individual file systems, writing such a utility for a file
> system with no known pathologies is impossible.
That's a fallacy (and is incorrect
Tom Bird wrote:
> Richard Elling wrote:
>
>
>> I see no evidence that the data is or is not correct. What we know is that
>> ZFS is attempting to read something and the device driver is returning EIO.
>> Unfortunately, EIO is a catch-all error code, so more digging to find the
>> root cause is
Tom Bird wrote:
> Richard Elling wrote:
>
>
>> I see no evidence that the data is or is not correct. What we know is that
>> ZFS is attempting to read something and the device driver is returning EIO.
>> Unfortunately, EIO is a catch-all error code, so more digging to find the
>> root cause is
On Wed, Aug 6, 2008 at 8:20 AM, Tom Bird <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Have a problem with a ZFS on a single device, this device is 48 1T SATA
> drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had
> a ZFS on it as a single device.
>
> There was a problem with the SAS bus
> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
c> If that's really the excuse for this situation, then ZFS is
c> not ``always consistent on the disk'' for single-VDEV pools.
re> I disagree with your assessment. The on-disk format (any
re> on-disk format) necessarily
Richard Elling wrote:
> I see no evidence that the data is or is not correct. What we know is that
> ZFS is attempting to read something and the device driver is returning EIO.
> Unfortunately, EIO is a catch-all error code, so more digging to find the
> root cause is needed.
I'm currently check
Miles Nordin wrote:
>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>> "tb" == Tom Bird <[EMAIL PROTECTED]> writes:
>>
>
> tb> There was a problem with the SAS bus which caused various
> tb> errors including the inevitable kernel panic, the thing came
> tb
On Wed, Aug 6, 2008 at 13:57, Miles Nordin <[EMAIL PROTECTED]> wrote:
>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>> "tb" == Tom Bird <[EMAIL PROTECTED]> writes:
>
>tb> There was a problem with the SAS bus which caused various
>tb> errors including the inevitable kernel pa
> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
> "tb" == Tom Bird <[EMAIL PROTECTED]> writes:
tb> There was a problem with the SAS bus which caused various
tb> errors including the inevitable kernel panic, the thing came
tb> back up with 3 out of 4 zfs mounted.
re> I
Tom Bird wrote:
> Hi,
>
> Have a problem with a ZFS on a single device, this device is 48 1T SATA
> drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had
> a ZFS on it as a single device.
>
> There was a problem with the SAS bus which caused various errors
> including the inevita
Hi,
Have a problem with a ZFS on a single device, this device is 48 1T SATA
drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had
a ZFS on it as a single device.
There was a problem with the SAS bus which caused various errors
including the inevitable kernel panic, the thing ca
50 matches
Mail list logo