Let me apologize in advanced for inter-mixing comments.


On 10/27/15 7:44 PM, Rainer Heilke wrote:

I am not trying to be a dick (it happens naturally), but if you cant
afford to backup terabytes of data, then you cant afford to have
terabytes of data.

That is a meaningless statement, that reflects nothing in real-world terms.
The true cost of a byte of data that you care about is the money you pay for the initial storage, and then the money you pay to back it up. For work, my front line databases have 64TB of mirrored net storage. The costs dont stop there. There is another 200TB of net storage dedicated to holding enough log data to rebuild the last 18 months from scratch. I also have two sets of slaves that snapshot themselves frequently. One set is a single disk, the other is raidz. These are not just backups. One set runs batch jobs, one runs the front end portal, and the masters are in charge of data ingestions.

The slaves are useful backups for zpool corruption on the front end but not necessarily for human error. For human error, say where someone destroys a table that replicates across all the slaves and some how isnt noticed until all the snapshots are deleted then we have the logs. I have different kinds of backups taken at different intervals to handle different kids of failures. Some are are live, some are snapshots, and some are source data. You need to determine your level of risk tolerance. That might mean using zfs send/recv to two different zpools with the same or different protection levels.

If you dont backup, you set yourself up for unrecoverable problems. In four years of running high transaction, high throughput databases on ZFS I have had to rebuild pools from time to time for different reasons. However, never for corruption. I have had other problems like unbalanced write load across vdevs and metaslab fragmentation. My point is, dont under estimate the costs of maintaining a byte of data. You might need the backup one day, even with protections that ZFS provides.

That said, instead of running mirrors run loose disks and backup to the second pool at a frequency you are comfortable with. You need to prioritize your resources against your risk tolerance. It is tempting to do mirrors because it is sexy but that might not be the best strategy.

This is just good stewardship of data you want to keep.

That's an arrogant statement, presuming that if a person doesn't have gobs of money, they shouldn't bother with computers at all.
I didnt write anything like that. What I am saying is you need to get more creative on how to protect your data. Yes, money makes it easier but you have options.

People who buy giant ass disks and then complain about how long it takes
to resilver a giant ass disk are out of their minds.

I am not complaining about the time it takes; I know full well how long it can take. I am complaining that the "resilvering" stops dead. (More on this below.)

This is trickier. I dont recall you saying it stops dead. I thought it was just "slow."

When the scrub is stopped dead, what does "iostat -nMxC 1" look like? Are there drives indicating 100% busy? high wait or asvc_t times?

Do you have any controller errors? Does iostat -En report any errors?

Have you tried mounting the pool ro, stopping the scrub, and then copying data off?

Here are some hail mary settings that probably wont help. I offer them (in no particular order) to try to improve the scrub time performance, minimized the number of enqueued I/Os in case that is exacerbating the problem some how, and attempt to limit the amount of time spent on a failing I/O. Your scrubs may be stopping because you have a disk that exhibiting a poor failure mode. Namely, some sort of internal error and it just keeps retrying which makes the pool wedge. WD is not the brand I go to for enterprise failure modes.

* dont spend more than 8ms on any single i/o
set sd:sd_io_time=8
* resilver in 5 second intervals minimum
set zfs:zfs_resilver_min_time_ms = 5000
set zfs:zfs_resilver_delay = 0
* enqueue only 5 I/Os
set zfs:zfs_top_maxinflight = 2


Apply these settings and try to resilver again. If this doesnt work, dd the drives to new ones. Using dd will likely identify which drive is wedging ZFS as it will either not complete or it will error out.
I have no idea what happened to your system for you to loose three disks
simultaneously.

This was covered in a thread ages ago; the tech took days to find the problem, which was a CMOS battery that was on Death's door.

I am not sure who the tech is, but at least two people on this list told you check the CMOS battery. I think Bob and I both recommended changing the battery. Others might have as well.

I just dont see you recovering from this scenario where you have
two bad drives trying to resilver from each other.

They aren't trying to resilver from each other. The dead disk is gone. The good disk is trying to resilver from the ether. Or some such. (Itself?) I added a third drive to the mirror in a vain attempt to get past the error saying there weren't enough remaining mirrors when I tried to zpool detach the now non-existent drive. Again, what is IT trying to resilver from? The same Twilight Zone the first disk is trying to resilver from?

I reviewed your output again. You have two disks in a mirror. Each disk is resilvering. This means the two disks are resilvering from each other. There are no other options as there are no other vdevs in the pool.

It seems to think that the one disk is fine, but the data isn't. ZFS is then locking the pool's I/O, not letting me clear up the damaged files (nor the pool). It's like there's a trapped loop between two parts of the ZFS code, but I refuse to believe Cantrill (and the many programmers since) didn't see this kind of problem.

ZFS suspects both disks could be dirty, that's why it is resilvering both drives. This puts the drives under heavy load. This load is probably surfacing an internal error on one or more drives but because WD has crappy failure modes it is not sending the error to OS. Internally, the drive keeps retrying with errors and the OS keeps waiting for the drive to return from write flush. This is likely what is wedging the pool. The problem is likely on the drive -- but I cant say that with certainty. Certainty is a big word.

There is another option, which has the potential to make your data diverge across the two disks if you dont mount them read-only.

Basically, reboot the system with just one of the vdevs installed and mounted read-only. There will be nothing for the system to resilver. You should be able to copy your data off to another disk, (or delete the corrupted files to restore send/recv functionality) if the disk you choose at random is working properly. If it is not working properly, wash-rinse-repeat with the other disk. Hopefully that one will work. If neither work, try changing cables or controllers although there is a chance that both WD drives are failed. I have had bad luck with WD.

If you are in the SF bay area and want to bring it by my office I am happy to take a stab it, after you back up your original disks (for liability reasons). I can provide a clean working "text book" system if needed, bring your system and drives and we can likely salvage it one way or another.

j.







_______________________________________________
openindiana-discuss mailing list
openindiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Reply via email to