On 12/7/06, Andrew Miller <[EMAIL PROTECTED]> wrote:
Quick question about the interaction of ZFS filesystem compression and the filesystem cache. We have an Opensolaris (actually Nexenta alpha-6) box running RRD collection. These files seem to be quite compressible. A test filesystem containing about 3,000 of these files shows a compressratio of 12.5x.
Be careful here. If you are using files that have no data in them yet you will get much better compression than later in life. Judging by the fact that you got only 12.5x, I suspect that your files are at least partially populated. Expect the compression to get worse over time. Looking at some RRD files that come from a very active (e.g. numbers vary frequently) servers with data filling about 2/3 of the configured time periods, I see the following rates: 1.8 mpstat.rrd 1.8 vmstat.rrd 1.9 exacct_PROJECT_user.oracle.rrd 2.0 net-ce2.rrd 2.1 iostat-c14.rrd 2.1 iostat-c15.rrd 2.1 iostat-c16.rrd . . . 7.6 net-ce912005.rrd 7.7 net-ce912016.rrd 9.1 exacct_PROJECT_user.gemsadm.rrd 12.2 exacct_PROJECT_exacct_interval.rrd 18.1 exacct_PROJECT_user.patrol.rrd 18.1 exacct_PROJECT_user.precise.rrd 18.1 exacct_PROJECT_user.precise6.rrd 31.8 net-ce8.rrd 39.6 net-eri3.rrd 45.1 net-eri2.rrd The first column is the compression ratio. The net-eri{2,3} files are almost empty.
My question is about how the filesystem cache works with compressed files. Does the fscache keep a copy of the compressed data, or the uncompressed blocks? To update one of these RRD files, I believe the whole contents are read into memory, modified, and then written back out. If the filesystem cache maintained a copy of the compressed data, a lot more, maybe more than 10x more, of these files could be maintained in the cache. That would mean we could have a lot more data files without ever needing to do a physical read.
Here is an insert of a value: 25450: open("/opt/perfstat/rrd/somehost/iostat-c4.rrd", O_RDWR) = 3 25450: fstat64(3, 0xFFBFF5E0) = 0 25450: fstat64(3, 0xFFBFF640) = 0 25450: fstat64(3, 0xFFBFF4E8) = 0 25450: ioctl(3, TCGETA, 0xFFBFF5CC) Err#25 ENOTTY 25450: read(3, " R R D\0 0 0 0 1\0\0\0\0".., 8192) = 8192 25450: llseek(3, 0, SEEK_CUR) = 8192 25450: lseek(3, 0xFFFFFC68, SEEK_CUR) = 7272 25450: fcntl(3, F_SETLK, 0xFFBFF7D0) = 0 25450: llseek(3, 0, SEEK_CUR) = 7272 25450: lseek(3, 2230952, SEEK_SET) = 2230952 25450: write(3, " @ x S = pA3D7\v ?E6 f f".., 64) = 64 25450: lseek(3, 1864, SEEK_SET) = 1864 25450: write(3, " E xA0 # U N K N\0\0\0\0".., 5408) = 5408 25450: close(3) = 0 Notice that it does the following: Open the file Read the first 8K Seek to a particular spot Take a lock Seek Write 64 bytes seek Write 5408 bytes close The rrd file in question is 8.6 MB. There was 8KB of reads and 5472 bytes of write. This is one of the big wins over the current binary rrd format over the original ASCII version that came with MRTG. Mike -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss