Bill Moore <Bill.Moore <at> sun.com> writes: > > Moving on, modern high-capacity SATA drives are in the 100-120MB/s > range. Let's call it 125MB/s for easier math. A 5-port port multiplier > (PM) has 5 links to the drives, and 1 uplink. SATA-II speed is 3Gb/s, > which after all the framing overhead, can get you 300MB/s on a good day. > So 3 drives can more than saturate a PM. 45 disks (9 backplanes at 5 > disks + PM each) in the box won't get you more than about 21 drives > worth of performance, tops. So you leave at least half the available > drive bandwidth on the table, in the best of circumstances. That also > assumes that the SiI controllers can push 100% of the bandwidth coming > into them, which would be 300MB/s * 2 ports = 600MB/s, which is getting > close to a 4x PCIe-gen2 slot.
Wrong. The theoretical bandwidth of an x4 PCI-E v2.0 slot is 2GB/s per direction (5Gbit/s before 8b-10b encoding per lane, times 0.8, times 4), amply sufficient to deal with 600MB/s. However they don't have this kind of slot, they have x2 PCI-E v1.0 slots (500MB/s per direction). Moreover SiI3132 default to a MAX_PAYLOAD_SIZE of 128 bytes therefore my guess is that each 2-port SATA card is only able to provide 60% of the theoretical throughput[1], or about 300MB/s. Then they have 3 such cards: total throughput of 900MB/s. Finally the 4th SATA card (with 4 ports) is in a 32-bit 33MHz PCI slot (not PCI-E). In practice such a bus can only provide a usable throughput of about 100MB/s (out of 133MB/s theoretical). All the bottlenecks are obviously the PCI-E links and the PCI bus. So in conclusion, my SBNSWAG (scientific but not so wild-ass guess) is that the max I/O throughput when reading from all the disks on 1 of their storage pod is about 1000MB/s. This is poor compared to a Thumper for example, but the most important factor for them was GB/$, not GB/sec. And they did a terrific job at that! > And I'd re-iterate what myself and others have observed about SiI and > silent data corruption over the years. Irrelevant, because it seems they have built fault-tolerance higher in the stack, à la Google. Commodity hardware + reliable software = great combo. [1] http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ -mrb _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss