Anton B. Rang wrote:
Richard wrote:
Any system which provides a single view of data (eg. a persistent storage
device) must have at least one single point of failure.

Why?

Consider this simple case: A two-drive mirrored array.

Use two dual-ported drives, two controllers, two power supplies,
arranged roughly as follows:

  -- controller-A <=> disk A <=> controller-B --
                  \            /
                   \          /
                    \ disk B /

Remind us where the single point of failure is in this arrangement?

The software which provides the single view of the data.

Seriously, I think it's pretty clear that high-end storage hardware is built to
eliminate single points of failure.  I don't think that NetApp, LSI Logic, IBM,
etc. would agree with your contention.  But maybe I'm missing something; is
there some more fundamental issue?  Do you mean that the entire system is a
single point of failure, if it's the only copy of said data?  That would be a
tautology....

Does anyone believe that the software or firmware in these systems
is infallible?  For all possible failure modes in the system?

I had written:

Mid-range and high-end servers, though, are starved of I/O bandwidth
relative to their CPU & memory. This is particularly true for Sun's hardware.

and Richard had asked (rhetorically?)

Please tell us how many storage arrays are required to meet a
theoretical I/O bandwidth of 244 GBytes/s?

My point is simply that, on most non-file-server hardware, the I/O bandwidth
available is not sufficient to keep all CPUs busy.  Host-based RAID can make
things worse since it takes away from the bandwidth available for user jobs.
Consider a Sun Fire 25K; the theoretical I/O bandwidth is 35 GB/sec (IIRC
that's the full-duplex number) while its 144 processors could do upwards of
259 GFlops.  That's 0.14 bytes/flop.

Consider something more current.  The M9000 has 244 GBytes/s of theoretical
I/O bandwidth.  Its been measured at 1.228 TFlops (peak).  So we see a ratio
of 0.19 bytes/flop.  But this ratio doesn't mean much, since there doesn't
seem to be a storage system that big connected to a single OS instance -- yet 
:-)

When people make this claim of bandwidth limitation, we often find that the
inherent latency limitation is more problematic.  For example, we can get
good memory bandwidth from DDR2 DIMMs, which we collect into 8-wide banks.
But we can't get past the latency of DRAM access.  Similarly, we can get upwards
of 100 MBytes/s media bandwidth from a fast, large disk, but can't get past
the 4.5 ms seek or 4.1 ms rotational delay time.  It is this latency issue
which effectively killed software RAID-5 (read-modify-write).  Fortunately,
ZFS's raidz is designed to avoid the need to do a read-modify-write.

To answer your rhetorical question, the DSC9550 does 3 GB/second for reads
and writes (doing RAID 6 and with hardware parity checks on reads -- nice!),
so you'd need 82 arrays.  In real life (an actual file system), ASC Purple with
GPFS got 102 GB/sec using 416 arrays.

Yeah, this impressive, but parallel (multi system/multi storage), so it is
really apples and oranges.
 -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to