On May 26, 2010, at 8:38 AM, Neil Perrin wrote:

> On 05/26/10 07:10, sensille wrote:
>> Recently, I've been reading through the ZIL/slog discussion and
>> have the impression that a lot of folks here are (like me)
>> interested in getting a viable solution for a cheap, fast and
>> reliable ZIL device.
>> I think I can provide such a solution for about $200, but it
>> involves a lot of development work.
>> The basic idea: the main problem when using a HDD as a ZIL device
>> are the cache flushes in combination with the linear write pattern
>> of the ZIL. This leads to a whole rotation of the platter after
>> each write, because after the first write returns, the head is
>> already past the sector that will be written next.
>> My idea goes as follows: don't write linearly. Track the rotation
>> and write to the position the head will hit next. This might be done
>> by a re-mapping layer or integrated into ZFS. This works only because
>> ZIL device are basically write-only. Reads from this device will be
>> horribly slow.
>> 
>> I have done some testing and am quite enthusiastic. If I take a
>> decent SAS disk (like the Hitachi Ultrastar C10K300), I can raise
>> the synchronous write performance from 166 writes/s to about
>> 2000 writes/s (!). 2000 IOPS is more than sufficient for our
>> production environment.
>> 
>> Currently I'm implementing a re-mapping driver for this. The
>> reason I'm writing to this list is that I'd like to find support
>> from the zfs team, find sparring partners to discuss implementation
>> details and algorithms and, most important, find testers!
>> 
>> If there is interest it would be great to build an official project
>> around it. I'd be willing to contribute most of the code, but any
>> help will be more than welcome.
>> 
>> So, anyone interested? :)
>> 
>> --
>> Arne Jansen
>> 
>>  
> 
> Yes, I agree this seems very appealing. I have investigated and
> observed similar results. Just allocating larger intent log blocks but
> only writing to say the first half of them has seen the same effect.
> Despite the impressive results, we have not pursued this further mainly
> because of it's maintainability. There is quite a variance between
> drives so, as mentioned, feedback profiling of the device is needed
> in the working system. The layering of the Solaris IO subsystem doesn't
> provide the feedback necessary and the ZIL code is layered on the SPA/DMU.
> Still it should be possible. Good luck!

I agree.  If you search the literature, you will find many cases where
people have tried to optimize file systems based on device geometry
and all have ended up as roadkill.  File systems last much longer than
the hardware and writing hardware-specific optimizations into the file
system just doesn't make good sense.

Meanwhile, though there are doubters, Intel's datasheet for the X-25V
clearly states support for the ATA FLUSH CACHE feature.  These can
be bought for around $120 and can do 2,500 random write IOPS.
http://download.intel.com/design/flash/nand/value/datashts/322736.pdf
Similarly, for the X-25E
http://download.intel.com/design/flash/nand/extreme/319984.pdf

I think the effort is better spent making sure the SSD vendors do the
right thing.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to