Good observations, Eric, more below...

Eric D. Mudama wrote:
> On Mon, Jan 19 at 23:14, Greg Mason wrote:
>> So, what we're looking for is a way to improve performance, without  
>> disabling the ZIL, as it's my understanding that disabling the ZIL  
>> isn't exactly a safe thing to do.
>>
>> We're looking for the best way to improve performance, without  
>> sacrificing too much of the safety of the data.
>>
>> The current solution we are considering is disabling the cache  
>> flushing (as per a previous response in this thread), and adding one  
>> or two SSD log devices, as this is similar to the Sun storage  
>> appliances based on the Thor. Thoughts?
> 
> In general principles, the evil tuning guide states that the ZIL
> should be able to handle 10 seconds of expected synchronous write
> workload.
> 
> To me, this implies that it's improving burst behavior, but
> potentially at the expense of sustained throughput, like would be
> measured in benchmarking type runs.

Yes.  Workloads that tend to be latency sensitive also tend
to be bursty. Or, perhaps that is just how it feels to a user.
Similar observations are made in the GUI design business where
user interactions are bursty, but latency sensitive.

> If you have a big JBOD array with say 8+ mirror vdevs on multiple
> controllers, in theory, each VDEV can commit from 60-80MB/s to disk.
> Unless you are attaching a separate ZIL device that can match the
> aggregate throughput of that pool, wouldn't it just be better to have
> the default behavior of the ZIL contents being inside the pool itself?

The problem is that the ZIL writes must be committed to disk
and magnetic disks rotate.  So the time to commit to media is,
on average, disregarding seeks, 1/2 the rotational period.
This ranges from 2 ms (15k rpm) to 5.5 ms (5,400 rpm). If the
workload is something like a tar -x of small files (source code)
then a 4.17 ms (7,200 rpm) disk would limit my extraction to a
maximum of 240 files/s.  If these are 4kByte files, the bandwidth
would peak at about 1 MByte/s.  Upgrading to a 15k rpm disk would
move the peak to about 500 files/s or 2.25 MBytes/s.  Using a
decent SSD would change this to 5000 files/s or 22.5 MBytes/s.

> The best practices guide states that the max ZIL device size should be
> roughly 50% of main system memory, because that's approximately the
> most data that can be in-flight at any given instant.

There is a little bit of discussion about this point, because
it really speaks to the ARC in general.  Look for it to be
clarified soon.  Also note that this is much more of a problem
for small memory machines.

> "For a target throughput of X MB/sec and given that ZFS pushes
> transaction groups every 5 seconds (and have 2 outstanding), we also
> expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service
> 100MB/sec of synchronous writes, 1 GBytes of log device should be
> sufficient."

It is a little bit more complicated than that because if the size
of the ZIL write is > 32 kBYtes, then it will be written directly
to the main pool, not the ZIL log.  This is because if you have
lots of large synchronous writes, then the system can become
bandwidth limited rather than latency limited and the way to solve
bandwidth problems is to reduce bandwidth demand.

> But, no comments are made on the performance requirements of the ZIL
> device(s) relative to the main pool devices.  Clicking around finds
> this entry:
> 
> http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on
> 
> ...which appears to indicate cases where a significant number of ZILs
> were required to match the bandwidth of just throwing them in the pool
> itself.

Yes.  And I think there are many more use cases which are not
yet characterized.  What we do know is that using an SSD for
the separate ZIL log works very well for a large number of cases.
It is not clear to me that the efforts to characterize a large
number of cases is worthwhile, when we can simply throw an SSD
at the problem and solve it.
  -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to