Re: [OpenIndiana-discuss] Recommendations for fast storage

Edward Ned Harvey (openindiana) Thu, 18 Apr 2013 05:18:34 -0700

> From: Timothy Coalson [mailto:tsc...@mst.edu]
> 
> Did you also compare the probability of bit errors causing data loss
> without a complete pool failure?  2-way mirrors, when one device
> completely
> dies, have no redundancy on that data, and the copy that remains must be
> perfect or some data will be lost.


I had to think about this comment for a little while to understand what you 
were saying, but I think I got it.  I'm going to rephrase your question:

If one device in a 2-way mirror becomes unavailable, then the remaining device 
has no redundancy.  So if a bit error is encountered on the (now non-redundant) 
device, then it's an uncorrectable error.  Question is, did I calculate that 
probability?

Answer is, I think so.  Modelling the probability of drive failure (either 
complete failure or data loss) is very complex and non-linear.  Also dependent 
on the specific model of drive in question, and the graphs are typically not 
available.  So what I did was to start with some MTBDL graphs that I assumed to 
be typical, and then assume every data-loss event meant complete drive failure. 
 Already I'm simplifying the model beyond reality, but the simplification 
focuses on worst case, and treats every bit error as complete drive failure.  
This is why I say "I think so," to answer your question.  

Then, I didn't want to embark on a mathematician's journey of derivatives and 
integrals over some non-linear failure rate graphs, so I linearized...  I 
forget now (it was like 4-6 years ago) but I would have likely seen that drives 
were unlikely to fail in the first 2 years, and about 50% likely to fail after 
3 years, and nearly certain to fail after 5 years, so I would have likely 
modeled that as a linearly increasing probability of failure rate up to 4 
years, where it's assumed 100% failure rate at 4 years.

Yes, this modeling introduces inaccuracy, but that inaccuracy is in the noise.  
Maybe in the first 2 years, I'm 25% off in my estimates to the positive, and 
after 4 years I'm 25% off in the negative, or something like that.  But when 
the results show 10^-17 probability for one configuration and 10^-19 
probability for a different configuration, then the 25% error is irrelevant.  
It's easy to see which configuration is more probable to fail, and it's also 
easy to see they're both well within acceptable limits for most purposes 
(especially if you have good backups.)


> Also, as for time to resilver, I'm guessing that depends largely on where
> bottlenecks are (it has to read effectively all of the remaining disks in
> the vdev either way, but can do so in parallel, so ideally it could be the
> same speed), 

No.  The big factor for resilver time is (a) the number of operations that need 
to be performed, and (b) the number of operations per second.

If you have one big vdev making up a pool, then the number of operations to be 
performed is equal to the number of objects in the pool.  The number of 
operations per second is limited by the worst case random seek time for any 
device in the pool.  If you have an all-SSD pool, then it's equal to a single 
disk performance.  If you have an all-HDD pool, then with increasing number of 
devices in your vdev, you approach 50% of the IOPS of a single device.

If your pool is broken down into a bunch of smaller vdev's, Let's say N mirrors 
that are all 2-way.  Then the number of operations to resilver the degraded 
mirror is 1/N of the total objects in the pool.  And the number of operations 
per second is equal to the performance of a single disk.  So the resilver time 
in the big vdev raidz is 2N times longer than the resilver time for the mirror.

As you mentioned, other activity in the pool can further reduce the number of 
operations per second.  If you have N mirrors, then the probability of the 
other activity affecting the degraded mirror is 1/N.  Whereas, with a single 
big vdev, you guessed it, all other activity is guaranteed to affect the 
resilvering vdev.


_______________________________________________
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss

Re: [OpenIndiana-discuss] Recommendations for fast storage

Reply via email to