Andrew Deason wrote:
As I'm sure you're all aware, filesize in ZFS can differ greatly from
actual disk usage, depending on access patterns. e.g. truncating a 1M
file down to 1 byte still uses up about 130k on disk when
recordsize=128k. I'm aware that this is a result of ZFS's rather
different internals, and that it works well for normal usage, but this
can make things difficult for applications that wish to restrain their
own disk usage.

The particular application I'm working on that has such a problem is the
OpenAFS <http://www.openafs.org/> client, when it uses ZFS as the disk
cache partition. The disk cache is constrained to a user-configurable
size, and the amount of cache used is tracked by counters internal to
the OpenAFS client. Normally cache usage is tracked by just taking the
file length of a particular file in the cache, and rounding it up to the
next frsize boundary of the cache filesystem. This is obviously wrong
when ZFS is used, and so our cache usage tracking can get very
incorrect.  So, I have two questions which would help us fix this:

  1. Is there any interface to ZFS (or a configuration knob or
  something) that we can use from a kernel module to explicitly return a
  file to the more predictable size? In the above example, truncating a
  1M file (call it 'A') to 1b mkes it take up 130k, but if we create a
  new file (call it 'B') with that 1b in it, it only takes up about 1k.
  Is there any operation we can perform on file 'A' to make it take up
  less space without having to create a new file 'B'?

  The cache files are often truncated and overwritten with new data,
  which is why this can become a problem. If there was some way to
  explicitly signal to ZFS that we want a particular file to be put in a
  smaller block or something, that would be helpful. (I am mostly
  ignorant on ZFS internals; if there's somewhere that would have told
  me this information, let me know)

  2. Lacking 1., can anyone give an equation relating file length, max
  size on disk, and recordsize? (and any additional parameters needed).
  If we just have a way of knowing in advance how much disk space we're
  going to take up by writing a certain amount of data, we should be
  okay.

Or, if anyone has any other ideas on how to overcome this, it would be
welcomed.


When creating a new file zfs will set its block size to be no larger than current value of recordsize. If there is at least recordsize of data to be written then the blocksize will equal to recordsize. From now on the file blocksize is "frozen" - that's why when you truncate it it keeps its original blocksize size. It also means that if file was smaller than recordsize (so its blocksize was smaller too) when you truncate it to 1B it will keep its smaller blocksize. IMHO you won't be able to lower a file blocksize other than by creating a new file. For example:

mi...@r600:~/progs$ mkfile 10m file1
mi...@r600:~/progs$ ./stat file1
size: 10485760    blksize: 131072
mi...@r600:~/progs$ truncate -s 1 file1
mi...@r600:~/progs$ ./stat file1
size: 1    blksize: 131072
mi...@r600:~/progs$
mi...@r600:~/progs$ rm file1
mi...@r600:~/progs$
mi...@r600:~/progs$ mkfile 10000 file1
mi...@r600:~/progs$ ./stat file1
size: 10000    blksize: 10240
mi...@r600:~/progs$ truncate -s 1 file1
mi...@r600:~/progs$ ./stat file1
size: 1    blksize: 10240
mi...@r600:~/progs$

If you are not worried with this extra overhead and you are mostly concerned with proper accounting of used disk space than instead of relaying on a file size alone you should take intro account its blocksize and round file size up-to blocksize (actual file size on disk (not counting metadata) is N*blocksize). However IIRC there is an open bug/rfe asking for a special treatment of a file's tail block so it can be smaller than the file blocksize. Once it's integrated your math could be wrong again.

Please also note that relaying on a logical file size could be even more misleading if compression is enabled in zfs (or dedup in the future). Relaying on blocksize will give you more accurate estimates.

You can get a file blocksize by using stat() and getting value of buf.st_blksize
or you can get a good estimate of used disk space by doing buf.st_blocks*512

mi...@r600:~/progs$ cat stat.c

#include <stdio.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

int main(int argc, char **argv)
{
  struct stat buf;

  if(!stat(argv[1], &buf))
  {
    printf("size: %d\tblksize: %d\n", buf.st_size, buf.st_blksize);
  }
  else
  {
    printf("ERROR: stat(), errno: %d\n", errno);
    exit(1);
  }
}

mi...@r600:~/progs$


--
Robert Milkowski
http://milek.blogspot.com






_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to