The specific question in this thread is just one instance of the more general question of how is I/O performance determined, how can you understand it, and how can you plan for and achieve it with determinism? I will attempt to answer this general question as simply as possible.
I/O throughput is a function of load level and response time. Response time is a function of load level with respect to each I/O size and access type combination in the workload for a given resource and in relative proportion. This is much simpler than it sounds. For example, “50 threads of 80% 8 KB Random Read 20% 64 KB Sequential Write” is a complete specification for an I/O workload and composition (it is complete because the sum is 100%). If, when applied to a specific resource the response time for 50 threads of 8 KB RW is 30 ms. and the response time for 50 threads of 64 KB SW is 15 ms. then the combined work load to this resource will have a response time of 0.80 * 30 + 0.20 * 15 = 27 ms. Note that the workload description is independent of response time until the resource has been specified, and that the total load level is divided by the number of resources units, so that while the composition in terms of I/O size and access type is a given, the load level is managed by how many resource units you configure. If 50 threads are spread over 5 units of resource the new response time for the given composition is based on 10 threads instead of 50 and will be lower in direct proportion to the difference in load level at the unit of resource. The response time for each I/O size access type combination weighted by its relative portion for a given load level determines the overall response time, which in turn determines the instantaneous throughput, which is load level divided by response time by Littles Law, in this example 50 / 0.027 = 1851 IOPS@ 27 ms. In all cases, you can decompose the I/O workload into proportions of I/O size and access type combination in proportions that sum to 100% and the composite response time is the weighted average of these response times. The instantaneous throughput is the total load level divided by this composite response time. The instantaneous throughput per load level is then weighted by the probability density function of load level and integrated over the range of load level. You can get the probability density function empirically, or estimate it with a Poisson distribution. However, for most cases, a simple calculation of the mean is all that is needed to see if you are where you need to be in terms of the target SLA. The distribution is only needed if you want to estimate variation and other modes. As a side note, this analysis is based on the gradient field of response time. Thus, it represents a conservative field, and the above mentioned integral, which defines Work in the formal sense, depends only on the start and end positions of load level; it is path independent. Intuitively, this means the ebb and flow of the workload within the boundaries so defined by the proportions of I/O size and access type combination overall do not concern us. All we need are the relative portions of the composition and the range of load level and we can define the expected throughout with great accuracy, at least, for any one system state. To design a configuration to deliver a specific response time for a given load level and composition you divide the load level by the number of resources needed to get the per resource load level in the desired range for the given composition. The total aggregate load level divided by the response time so obtained determines the instantaneous throughput of the system. The total throughput is then determined by how sustained each load level is, which is where the distribution comes in, but again, you do not need the distribution to set expectation for the mean; just use the average load level divided by the average response time and that is that mean expected capability, in terms of IOPS, of the configuration. How much of that capability is used is relative to the arrival rate and defines capability utilization. A Lamborghini going 1 MPH is 100% busy but far from 100% capability. To understand I/O performance is to know how many threads of what I/O size and access type combination are being serviced by what and how many of a given resource. When you configure filesystems, and volume managers, and HW RAID devices, all of these facilities transform the workload. So the load level and composition of the workload issued by the application is very different then the load level and composition issued to the resource. This is the general answer to any I/O performance question to which the current thread is one specific example. As for ZFS, ZFS changes everything ;-) I think of ZFS as first, a masterpiece of functionality and ease of use. From a performance predictability standpoint however, it is extremely challenging, as the workload issued to the pool is not a proper transformation of the workload issued to the file systems. There is a ton of speculative pre-fetch going on. In fact, the actual requested I/O contained in any given read at the pool level is very small compared to the amount which is speculative. And for writes, they are always converted to full stripes of FS recsize divided by the number of drives in a vdev, then coalesced vertically into 128 KB to 32 KB per disk, depending on the recsize, and written sequentially. The same rules for I/O throughput as discussed above still apply to ZFS, but with ZFS it is not clear how much of any given I/O is the request I/O and how much is from the COW for write, or speculative pre fetch on read. It will take some time to fully understand it, but I think it will be well worth the effort, it is a remarkable advancement in filesystem technology. Regards, The ORtera man ;-) This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org