Re: [zfs-discuss] Sun Flash Modules

Andrew Gabriel Sat, 18 Apr 2009 13:38:09 -0700

Bob Friesenhahn wrote:

On Sat, 18 Apr 2009, Eric D. Mudama wrote:
What is tall about the SATA stack?  There's not THAT much overhead in
SATA, and there's no reason you would need to support any legacy
transfer modes or commands you weren't interested in.
If SATA is much more than a memcpy() then it is excessive overhead fora memory-oriented device. In fact, since the "device" is actuallycomprised of quite a few independent memory modules, it should bepossible to schedule I/O for each independent memory module inparallel. A large storage system will be comprised of tens, hundredsor even thousands of independent memory modules so it does not makesense to serialize access via legacy protocols. The larger thestorage device, the more it suffers from a serial protocol.

It's a mistake to think that flash looks similar to RAM. It doesn't inlots of ways -- actually it looks more similar to a hard disk in manyrespects;-)

It's true that you will find lots of flash memory modules on an SSD.This is because they are slow, and in order to be able to make good useof the available SATA bandwidth, many are paralleled up so the data canbe transferred in parallel to lots of them, so you are able to use agood proportion of the SATA bandwidth (think of it like a mini RAID0array. In the case of the SATA disks we sell for X and T series systems,there are 10 parallel flash channels in each one, which enables thedevice to achieve about 85% of the theoretical SATA bandwidth (which isway higher than any single hard drive can do, except to its cache).

Also, like a hard disk, flash blocks go bad, and again like a disk, theSSD has spare blocks to use as replacements, and includes bad blockhandling logic in its controller to map these in when required. Over thelife of an Enterprise class SSD, the controller actually expects manymore flash block failures than you would ever see on a working a harddisk, and there is consequently a much larger proportion of spare flashmemory included than a hard drive will normally have, in order toachieve the same life. (Unlike a hard disk, blocks tend to diegradually, so the flash controller can normally detect them getting weakand map to replacement blocks long before any user data is lost.)

One departure from a hard disk is that flash blocks wear out accordingto how much they're used. Most filesystems have blocks in some positionswhich are used much more than others (e.g. superblocks, uberblocks,etc), and these are normally really critical to the filesystem.Designers of SSDs know that it would be completely unacceptable for suchcritical blocks to fail quickly -- that would in effect mean the SSD hada very short life, although most of it would still be fine when itbecame useless. To counteract this, the on-board SSD controllerimplements a feature called wear leveling. What this does is to move thelogical block numbers around on the physical flash blocks, so that allthe blocks wear at the same rate. So you can sit there continuallyrewriting block 0, and you won't wear out the first flash block, as thecontroller will move around where it stores block 0 in flash so all theflash memory wears at the same rate, and you get longest possible lifefrom the SSD.

When you've considered these (and doubtless other) issues, it shouldbecome clear why flash memory of the type we currently have availablemakes good sense to build it into something resembling a disk. It reallylooks nothing like DRAM memory. I'm sure that in time new flashtechnologies will appear, and it may make sense to build them presentingdifferent interfaces.


--
Andrew
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Modules

Reply via email to