Eric D. Mudama writes: > On Mon, Jan 19 at 23:14, Greg Mason wrote: > >So, what we're looking for is a way to improve performance, without > >disabling the ZIL, as it's my understanding that disabling the ZIL > >isn't exactly a safe thing to do. > > > >We're looking for the best way to improve performance, without > >sacrificing too much of the safety of the data. > > > >The current solution we are considering is disabling the cache > >flushing (as per a previous response in this thread), and adding one > >or two SSD log devices, as this is similar to the Sun storage > >appliances based on the Thor. Thoughts? > > In general principles, the evil tuning guide states that the ZIL > should be able to handle 10 seconds of expected synchronous write > workload. > > To me, this implies that it's improving burst behavior, but > potentially at the expense of sustained throughput, like would be > measured in benchmarking type runs. > > If you have a big JBOD array with say 8+ mirror vdevs on multiple > controllers, in theory, each VDEV can commit from 60-80MB/s to disk. > Unless you are attaching a separate ZIL device that can match the > aggregate throughput of that pool, wouldn't it just be better to have > the default behavior of the ZIL contents being inside the pool itself? > > The best practices guide states that the max ZIL device size should be > roughly 50% of main system memory, because that's approximately the > most data that can be in-flight at any given instant. > > "For a target throughput of X MB/sec and given that ZFS pushes > transaction groups every 5 seconds (and have 2 outstanding), we also > expect the ZIL to not grow beyond X MB/sec * 10 sec. So to service > 100MB/sec of synchronous writes, 1 GBytes of log device should be > sufficient." > > But, no comments are made on the performance requirements of the ZIL > device(s) relative to the main pool devices. Clicking around finds > this entry: > > http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on > > ...which appears to indicate cases where a significant number of ZILs > were required to match the bandwidth of just throwing them in the pool > itself. > >
Big topic. Some write requests are synchronous and some not, some start as non synchronous and end up being synced. For non-synchronous loads, ZFS does not commit data to the slog. The presence of the slog is transparent and won't hinder performance. For synchronous loads, the performance is normally governed by fewer threads commiting more modest amount of data; performance here is dominated by latency effect, not disk throughput and this is where a slog greatly helps (10X). Now you're right to point out that some workloads might end up as synchronous while still manageing large quantity of data. The Storage 7000 line was tweaked to handle some of those cases. So when commiting more say 10MB in a single operation, the first MB will go to the SSD but the rest will actually be send to the main storage pool. All these I/Os being issued concurrently. The latency response of a 1 MB to our SSD is expected to be similar to the response of regular disks. -r > --eric > > > -- > Eric D. Mudama > edmud...@mail.bounceswoosh.org > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss