Bill Moore <Bill.Moore <at> sun.com> writes:
> 
> Moving on, modern high-capacity SATA drives are in the 100-120MB/s
> range.  Let's call it 125MB/s for easier math.  A 5-port port multiplier
> (PM) has 5 links to the drives, and 1 uplink.  SATA-II speed is 3Gb/s,
> which after all the framing overhead, can get you 300MB/s on a good day.
> So 3 drives can more than saturate a PM.  45 disks (9 backplanes at 5
> disks + PM each) in the box won't get you more than about 21 drives
> worth of performance, tops.  So you leave at least half the available
> drive bandwidth on the table, in the best of circumstances.  That also
> assumes that the SiI controllers can push 100% of the bandwidth coming
> into them, which would be 300MB/s * 2 ports = 600MB/s, which is getting
> close to a 4x PCIe-gen2 slot.

Wrong. The theoretical bandwidth of an x4 PCI-E v2.0 slot is 2GB/s per
direction (5Gbit/s before 8b-10b encoding per lane, times 0.8, times 4),
amply sufficient to deal with 600MB/s.

However they don't have this kind of slot, they have x2 PCI-E v1.0
slots (500MB/s per direction). Moreover SiI3132 default to a
MAX_PAYLOAD_SIZE of 128 bytes therefore my guess is that each 2-port
SATA card is only able to provide 60% of the theoretical throughput[1],
or about 300MB/s.

Then they have 3 such cards: total throughput of 900MB/s.

Finally the 4th SATA card (with 4 ports) is in a 32-bit 33MHz PCI slot
(not PCI-E). In practice such a bus can only provide a usable throughput
of about 100MB/s (out of 133MB/s theoretical).

All the bottlenecks are obviously the PCI-E links and the PCI bus.
So in conclusion, my SBNSWAG (scientific but not so wild-ass guess)
is that the max I/O throughput when reading from all the disks on
1 of their storage pod is about 1000MB/s. This is poor compared to
a Thumper for example, but the most important factor for them was
GB/$, not GB/sec. And they did a terrific job at that!

> And I'd re-iterate what myself and others have observed about SiI and
> silent data corruption over the years.

Irrelevant, because it seems they have built fault-tolerance higher in
the stack, à la Google. Commodity hardware + reliable software = great
combo.

[1] 
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

-mrb

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to