On Wed, 30 Apr 2014, Matthew Fleming wrote:

On Wed, Apr 30, 2014 at 7:48 AM, Ian Lepore <i...@freebsd.org> wrote:

For some reason this reminded me of something I've been wanting for a
while but never get around to writing... /dev/ones, it's just
like /dev/zero except it returns 0xff bytes.  Useful for dd'ing to wipe
out flash-based media.

dd if=/dev/zero | tr "\000" "\377" | dd of=<xxx>

Why all these processes and i/o's?

tr </dev/dev/zero "\000" "\377"

The dd's may be needed for controlling the block sizes.

But it's not quite the same.

It is better, since it is not limited to 0xff bytes :-).

Oops, perhaps not.  tr not only uses stdio to pessimize the i/o; it uses
wide characters 1 at a time.  It used to use only characters 1 at a time.

yes(1) is limited to newline bytes, or newlines mixed with strings.  It
also uses stdio to pessimize the i/o, but not wide characters yet.

stdio's pessimizations begin with naively believing that st_blksize gives
a good i/o size.  For most non-regular files, including all (?) devices
and all (?) pipes, st_blksize is PAGE_SIZE.  For disks, this has been
broken signficantly since FreeBSD-4 where it was the disk's si_bsize_best
(usually 64K).  For pipes, this has been broken significantly since
FreeBSD-4 where it was pipe_buffer.size (either PIPE_SIZE = 16K or
BIG_PIPE_SIZE = 64K).

So standard utilities tend to be too slow to use on disks.  You have to
use dd and relatively complicated pipelines to get adequate block sizes.
Sometimes dd or a special utility is needed to get adequate control and
error handling.  I have such a special utility for copying disks
with bad sectors, but prefer to use just cp fpr copying disks.  cp
doesn't use stdio, and doesn't use mmap() above certain small size; it
uses read/write() with a fixed block size of 64K or maybe larger in
-current, so it works OK for copying disks.

The most broken utilities that I use often for disk devices are:

- md5.  This (really libmd/mdXhl.c) has been broken on all devices (really
  on all non-regular files) since ~2001.  It is broken by misusing
  st_size instead of by trusting st_blksize.  st_size is only valid
  for regular files, but is used on other file types to break them.
  For example:

    pts/21:bde@freefall:~> md5 /dev/null
    MD5 (/dev/null) = d41d8cd98f00b204e9800998ecf8427e
    pts/21:bde@freefall:~> md5 /dev/zero
    MD5 (/dev/zero) = d41d8cd98f00b204e9800998ecf8427e

  Similarly for disk devices.  All devices are seen as empty by md5.

  The workaround is to use a pipeline, or just stdin.  "cat /dev/zero | md5"
  and even "md5 </dev/zero" confuse md5 into using a different input method
  that works.  OTOH, "md5 /dev/fd/0" sees an empty device file, and
  "cat /dev/zero | md5 /dev/fd/0" fails immediately with a seek error.
  Pipes have st_size == 0 too, so the input method that stats the file
  would see an empty file too, so it must not be reached in the working
  case.  "md5 /dev/fd/0" apparently just stats the device file, and this
  appears to be empty.  I'm not sure if it is the tty device file or
  /dev/fd/0 that is seen.  "cat /dev/zero | md5 /dev/fd/0" apparently
  reaches the buggy code, but somehow gets further and fails trying to
  seek.

  To get adequate block sizes for disks, use dd in the pipeline that must
  be used for other reasons.

  I only recently noticed that pipes have st_blksize = PAGE_SIZE, so that
  if you pipe to stdio utilities then the i/o will be pessimized and
  reblocking using another dd in a pipeline to get back to an adequate
  size.  PAGE_SIZE is large enough to not be very pessimal for some uses.

- cmp.  cmp uses mmap() excessively for regular files, but for device files
  it uses per-char stdio excessively.

  (
  More on md5.  The i/o routine for the working is are in the application
  (md5/md5.c).  This uses fread() with the bad block size BUFSIZ.  This
  is still 1024.  It is more broken than st_blksize.  However, fread()
  is not per-char, so it is reasonably efficient.  stdio uses st_blksize
  for read() from the file.  When the file is regular, the block size
  is again relatively unimportant provided the file system has a large
  enough block size or does clustering.  For device files, clustering
  might occur at levels below the file system, but usually doesn't for
  disks.  Instead, small i/o's get relatively slower with time except
  on high-end SSDs with high transactions per second, because clustering
  at low levels takes too many transactions.

  The i/o routine for the non0-working case is in the library
  (libmd/mdXhl.c).  It uses read(), but with the silly stdio block
  size of BUFSIZ.  libmd files have several includes of <stdio.h>, but
  don't seem to use stdio except for bugs like this.  The result is that
  the i/o is especially pessimized for the usual regular file case.
  Buffering in the kernel limits this pessimization.
  )

  The device file case for cmp just uses getc()/putc().  This first
  gets the st_blksize pessimization.  Then it gets the slow per-char
  i/o fro using getc()/putc().  For disks, the first pessimizations
  tends to dominate but the second one is noticeable.  For fast
  input devices it is very noticeable.  On freefall now:
  "dd if=/dev/zero bs=1m count=4k of=/dev/null": speed is 21GB/sec;
  "dd if=/dev/zero bs=1m count=4k | cmp - /dev/zero": speed is 187MB/sec.
  The overhead is a factor of 110.  With iron disks, the overhead would
  be a factor of about 1/2.

  The loop in cmp for regular files is slow too, but only in comparison
  with the memcpy() that is (essentially) used for reading /dev/zero
  and with the memcmp() that should be used by cmp.  It just compares
  bytewise and has mounds of bookkeeping to count characters and lines
  for the rare cases that fail.  The usual case should just use mmap()
  of the whole file (if not read()) and memcmp() on that.

  I recently noticed a very bad case for cmp on regular files too.  I
  was comparing large files on an cd9600 file system on a DVD, under
  an old version of FreeBSD.  cmp mmap()s the whole file.  The i/o
  for this is done by vm, and vm generated only minimal i/o's with
  the cd9660 block size of 2K.  read() would have done clustering
  to a block size of 64K.  Perhaps vm is better now, but it is hard
  to see how it could do as well as read() without doing the same
  clustering as read().

  One workaround for this is to prefetch files into the buffer (vmio)
  cache using read().  It is hard to avoid thrashing of the cache
  with this, so I used workarounds like diff'ing the files instead
  of cmp'ing them.  diff is much heavier weight, but it runs faster
  since it doesn't use mmap() (gnu diff seems to use fread() and
  suffers from stdio using st_blksize).

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to