Bob Friesenhahn wrote:
On Wed, 17 Jun 2009, Haudy Kazemi wrote:
usable with very little CPU consumed.
If the system is dedicated to serving files rather than also being
used interactively, it should not matter much what the CPU usage is.
CPU cycles can't be stored for later use. Ultimately, it (mostly*)
does not matter if
Clearly you have not heard of the software flywheel:
http://www.simplesystems.org/users/bfriesen/software_flywheel.html
I had not heard of such a device, however from the description it
appears to be made from virtual unobtanium.... :)
My line of reasoning is that unused CPU cycles are to some extent a
wasted resource, paralleling the idea that having system RAM sitting
empty/unused is also a waste and should be used for caching until the
system needs that RAM for other purposes (how the ZFS cache is supposed
to work). This isn't a perfect parallel as CPU power consumption and
heat outlet do vary by load much more than does RAM. I'm sure someone
could come up with a formula for the optimal CPU loading to maximize
energy efficiency. There has been work on this the paper 'Dynamic Data
Compression in Multi-hop Wireless Networks' at
http://enl.usc.edu/~abhishek/sigmpf03-sharma.pdf .
If I understand the blog entry correctly, for text data the task took
up to 3.5X longer to complete, and for media data, the task took about
2.2X longer to complete with a maximum storage compression ratio of
2.52X.
For my backup drive using lzjb compression I see a compression ratio
of only 1.53x.
I linked to several blog posts. It sounds like you are referring to '
http://blogs.sun.com/dap/entry/zfs_compression#comments '?
This blog's test results show that on their quad core platform (Sun 7410
have quad core 2.3 ghz AMD Opteron cpus*) :
*
http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/7410/spec
for text data, LZJB compression had negligible performance benefits
(task times were unchanged or marginally better) and less storage space
was consumed (1.47:1).
for media data, LZJB compression had negligible performance benefits
(task times were unchanged or marginally worse) and storage space
consumed was unchanged (1:1).
Take away message: as currently configured, their system has nothing to
lose from enabling LZJB.
for text data, GZIP compression at any setting, had a significant
negative impact on write times (CPU bound), no performance impact on
read times, and significant positive improvements in compression ratio.
for media data, GZIP compression at any setting, had a significant
negative impact on write times (CPU bound), no performance impact on
read times, and marginal improvements in compression ratio.
Take away message: With GZIP as their system is currently configured,
write performance would suffer in exchange for a higher compression
ratio. This may be acceptable if the system fulfills a role that has a
read heavy usage profile of compressible content. (An archive.org
backend would be such an example.) This is similar to the tradeoff made
when comparing RAID1 or RAID10 vs RAID5.
Automatic benchmarks could be used to detect and select the optimal
compression settings for best performance, with the basic case assuming
the system is a dedicated file server and more advanced cases accounting
for the CPU needs of other processes run on the same platform. Another
way would be to ask the administrator what the usage profile for the
machine will be and preconfigure compression settings suitable for that
use case.
Single and dual core systems are more likely to become CPU bound from
enabling compression than a quad core.
All systems have bottlenecks in them somewhere by virtue of design
decisions. One or more of these bottlenecks will be the rate limiting
factor for any given workload, such that even if you speed up the rest
of the system the process will still take the same amount of time to
complete. The LZJB compression benchmarks on the quad core above
demonstrate that LZJB is not the rate limiter either in writes or
reads. The GZIP benchmarks show that it is a rate limiter, but only
during writes. On a more powerful platform (6x faster CPU), GZIP writes
may no longer be the bottleneck (assuming that the network bandwidth and
drive I/O bandwidth remain unchanged).
System component balancing also plays a role. If the server is
connected via a 100 Mbps CAT5e link, and all I/O activity is from client
computers on that link, does it make any difference if the server is
actually capable of GZIP writes at 200 Mbps, 500 Mbps, or 1500 Mbps? If
the network link is later upgraded to Gigabit ethernet, now only the
system capable of GZIPing at 1500 Mbps can keep up. The rate limiting
factor changes as different components are upgraded.
In many systems for many workloads, hard drive I/O bandwidth is the rate
limiting factor that has the most significant performance impact, such
that a 20% boost in drive I/O is more noticeable than a 20% boost in CPU
performance (or even a doubling of CPU performance). Many systems are
now becoming quite unbalanced in terms of I/O bandwidth vs CPU
performance. Trading CPU cycles for I/O bandwidth is one way of
compensating for the imbalance, if the task is not already CPU-bound.
(A CPU-bound process has the CPU as the rate-limiting factor. A common
characteristic of CPU-bound processes is they run the CPU at 100%, and
would benefit from a faster processor. Non CPU-bound processes have a
different rate-limiting factor which remains unchanged even if a faster
CPU is used. An example of a non CPU-bound process is MP3 decoding for
live playback. An example of balancing a system is to compare a recent
netbook to a stock configuration Pentium 3 laptop from 2002. They both
have CPUs of similar capability but the netbooks come with more RAM and
some with flash memory rather than hard drives. The performance boost
from extra RAM and flash memory storage helps compensate for what by
2009 standards are slow CPUs. As a result, the netbooks tend to have a
better balance of CPU/RAM/permanent storage capacity and performance
than the stock configuration Pentium 3 laptops (an upgraded
ultraportable Pentium 3 laptop can match a netbook quite well).
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss