bug#45648: `dd` seek/skip which way is up?

Paul Eggert Tue, 22 Feb 2022 09:13:19 -0800

On 1/4/21 20:08, Paul Eggert wrote:

On 1/4/21 7:44 PM, Bela Lubkin wrote:
TLDR: *huge* existing presence of 'iseek' and 'oseek'; most OSes document
them as pure synonyms for 'skip' and 'seek'.
Thanks for doing all that research. It's compelling, and I think yourpatch (or something like it) should go in. I'll wait for a bit to hearother opinions.

After thinking about the patch a bit more, let's omit the part aboutadding new conversions iseek_bytes etc., as I think there's a better wayto address that issue. I proposed something in <https://bugs.gnu.org/54112>.

So instead of your patch, I installed the attached patches. The firstone adds the iseek and oseek operands that you suggested; the second oneclarifies dd documentation, as I found several things were confusingwhen rereading it carefully. Something like these patches should appearin the next coreutils release.

From 6ad981900cc170258d4914197e2796fc94a37863 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Mon, 21 Feb 2022 11:23:02 -0800
Subject: [PATCH 1/2] dd: support iseek= and oseek=

Alias iseek=N to skip=N, oseek=N to seek=N (Bug#45648).
* src/dd.c (scanargs): Parse iseek= and oseek=.
* tests/dd/skip-seek.pl (sk-seek5): New test case.
---
 NEWS                  |  3 +++
 doc/coreutils.texi    | 16 ++++++++++------
 src/dd.c              |  8 ++++----
 tests/dd/skip-seek.pl | 10 ++++++++++
 4 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index ef65b4ab8..de03f0d47 100644
--- a/NEWS
+++ b/NEWS
@@ -57,6 +57,9 @@ GNU coreutils NEWS                                    -*- outline -*-
   dd conv=fsync now synchronizes output even after a write error,
   and similarly for dd conv=fdatasync.
 
+  dd now supports the aliases iseek=N for skip=N, and oseek=N for seek=N,
+  like FreeBSD and other operating systems.
+
   timeout --foreground --kill-after=... will now exit with status 137
   if the kill signal was sent, which is consistent with the behavior
   when the --foreground option is not specified.  This allows users to
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 8d2974bde..4ec998802 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9189,8 +9189,7 @@ Read from @var{file} instead of standard input.
 @item of=@var{file}
 @opindex of
 Write to @var{file} instead of standard output.  Unless
-@samp{conv=notrunc} is given, @command{dd} truncates @var{file} to zero
-bytes (or the size specified with @samp{seek=}).
+@samp{conv=notrunc} is given, truncate @var{file} before writing it.
 
 @item ibs=@var{bytes}
 @opindex ibs
@@ -9230,15 +9229,20 @@ When converting variable-length records to fixed-length ones
 use @var{bytes} as the fixed record length.
 
 @item skip=@var{n}
+@itemx iseek=@var{n}
 @opindex skip
+@opindex iseek
 Skip @var{n} @samp{ibs}-byte blocks in the input file before copying.
 If @samp{iflag=skip_bytes} is specified, @var{n} is interpreted
 as a byte count rather than a block count.
 
 @item seek=@var{n}
+@itemx oseek=@var{n}
 @opindex seek
-Skip @var{n} @samp{obs}-byte blocks in the output file before copying.
-if @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
+@opindex oseek
+Skip @var{n} @samp{obs}-byte blocks in the output file before
+truncating or copying.
+If @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
 as a byte count rather than a block count.
 
 @item count=@var{n}
@@ -9588,14 +9592,14 @@ This flag can be used only with @code{iflag}.
 
 @item skip_bytes
 @opindex skip_bytes
-Interpret the @samp{skip=} operand as a byte count,
+Interpret the @samp{skip=} or @samp{iseek=} operand as a byte count,
 rather than a block count, which allows specifying
 an offset that is not a multiple of the I/O block size.
 This flag can be used only with @code{iflag}.
 
 @item seek_bytes
 @opindex seek_bytes
-Interpret the @samp{seek=} operand as a byte count,
+Interpret the @samp{seek=} or @samp{oseek=} operand as a byte count,
 rather than a block count, which allows specifying
 an offset that is not a multiple of the I/O block size.
 This flag can be used only with @code{oflag}.
diff --git a/src/dd.c b/src/dd.c
index 7360a4973..1c30e414d 100644
--- a/src/dd.c
+++ b/src/dd.c
@@ -562,8 +562,8 @@ Copy a file, converting and formatting according to the operands.\n\
   obs=BYTES       write BYTES bytes at a time (default: 512)\n\
   of=FILE         write to FILE instead of stdout\n\
   oflag=FLAGS     write as per the comma separated symbol list\n\
-  seek=N          skip N obs-sized blocks at start of output\n\
-  skip=N          skip N ibs-sized blocks at start of input\n\
+  seek=N          (or oseek=N) skip N obs-sized output blocks\n\
+  skip=N          (or iseek=N) skip N ibs-sized input blocks\n\
   status=LEVEL    The LEVEL of information to print to stderr;\n\
                   'none' suppresses everything but error messages,\n\
                   'noxfer' suppresses the final transfer statistics,\n\
@@ -1564,9 +1564,9 @@ scanargs (int argc, char *const *argv)
               n_max = MIN (SIZE_MAX, IDX_MAX);
               converted_idx = &conversion_blocksize;
             }
-          else if (operand_is (name, "skip"))
+          else if (operand_is (name, "skip") || operand_is (name, "iseek"))
             skip = n;
-          else if (operand_is (name, "seek"))
+          else if (operand_is (name + (*name == 'o'), "seek"))
             seek = n;
           else if (operand_is (name, "count"))
             count = n;
diff --git a/tests/dd/skip-seek.pl b/tests/dd/skip-seek.pl
index 41639cc71..0fcb1cf25 100755
--- a/tests/dd/skip-seek.pl
+++ b/tests/dd/skip-seek.pl
@@ -68,6 +68,16 @@ my @Tests =
       {OUT=> "bc\n"},
       {ERR=> "3+0 records in\n3+0 records out\n"},
      ],
+     [
+      # Check that iseek and oseek aliases work too.
+      'sk-seek5',
+      qw (bs=1 iseek=1 oseek=2 conv=notrunc count=3 status=noxfer of=@AUX@ < ),
+      {IN=> '0123456789abcdef'},
+      {AUX=> 'zyxwvutsrqponmlkji'},
+      {OUT=> ''},
+      {ERR=> "3+0 records in\n3+0 records out\n"},
+      {CMP=> ['zy123utsrqponmlkji', {'@AUX@'=> undef}]},
+     ],
     );
 
 my $save_temps = $ENV{DEBUG};
-- 
2.32.0

From a2d0ad6c6de032acadec32532afc22e47da4b617 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Tue, 22 Feb 2022 08:55:53 -0800
Subject: [PATCH 2/2] dd: improve doc relative to POSIX

* doc/coreutils.texi (dd invocation): Improve documentation,
clarifying whether features are extensions to POSIX.
---
 doc/coreutils.texi | 90 ++++++++++++++++++++++++++++++++++++----------
 1 file changed, 72 insertions(+), 18 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index 4ec998802..5419c61ef 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -9166,9 +9166,8 @@ option, and overrides the @option{--preserve=all} and @option{-a} options.
 @pindex dd
 @cindex converting while copying a file
 
-@command{dd} copies a file (from standard input to standard output, by
-default) with a changeable I/O block size, while optionally performing
-conversions on it.  Synopses:
+@command{dd} copies input to output with a changeable I/O block size,
+while optionally performing conversions on the data.  Synopses:
 
 @example
 dd [@var{operand}]@dots{}
@@ -9176,7 +9175,43 @@ dd @var{option}
 @end example
 
 The only options are @option{--help} and @option{--version}.
-@xref{Common options}.  @command{dd} accepts the following operands,
+@xref{Common options}.
+
+By default, @command{dd} copies standard input to standard output.
+To copy, @command{dd} repeatedly does the following steps in order:
+
+@enumerate
+@item
+Read an input block.
+
+@item
+If converting via @samp{sync}, pad as needed to meet the input block size.
+Pad with spaces if converting via @samp{block} or @samp{unblock}, NUL
+bytes otherwise.
+
+@item
+If @samp{bs=} is given and no conversion mentioned in steps (4) or (5)
+is given, output the data as a single block and skip all remaining steps.
+
+@item
+If the @samp{swab} conversion is given, swap each pair of input bytes.
+If the input data length is odd, preserve the last input byte
+(since there is nothing to swap it with).
+
+@item
+If any of the conversions @samp{swab}, @samp{block}, @samp{unblock},
+@samp{lcase}, @samp{ucase}, @samp{ascii}, @samp{ebcdic} and @samp{ibm}
+are given, do these conversions.  These conversions operate
+independently of input blocking, and might deal with records that span
+block boundaries.
+
+@item
+Aggregate the resulting data into output blocks of the specified size,
+and output each output block in turn.  Do not pad the last output block;
+it can be shorter than usual.
+@end enumerate
+
+@command{dd} accepts the following operands,
 whose syntax was inspired by the DD (data definition) statement of
 OS/360 JCL.
 
@@ -9233,8 +9268,9 @@ use @var{bytes} as the fixed record length.
 @opindex skip
 @opindex iseek
 Skip @var{n} @samp{ibs}-byte blocks in the input file before copying.
-If @samp{iflag=skip_bytes} is specified, @var{n} is interpreted
+With @samp{iflag=skip_bytes}, interpret @var{n}
 as a byte count rather than a block count.
+(The @samp{iseek=} spelling is an extension to POSIX.)
 
 @item seek=@var{n}
 @itemx oseek=@var{n}
@@ -9242,20 +9278,22 @@ as a byte count rather than a block count.
 @opindex oseek
 Skip @var{n} @samp{obs}-byte blocks in the output file before
 truncating or copying.
-If @samp{oflag=seek_bytes} is specified, @var{n} is interpreted
+With @samp{oflag=seek_bytes}, interpret @var{n}
 as a byte count rather than a block count.
+(The @samp{oseek=} spelling is an extension to POSIX.)
 
 @item count=@var{n}
 @opindex count
 Copy @var{n} @samp{ibs}-byte blocks from the input file, instead
 of everything until the end of the file.
-if @samp{iflag=count_bytes} is specified, @var{n} is interpreted
+With @samp{iflag=count_bytes}, interpret @var{n}
 as a byte count rather than a block count.
-Note if the input may return short reads as could be the case
+If short reads occur, as could be the case
 when reading from a pipe for example, @samp{iflag=fullblock}
-will ensure that @samp{count=} corresponds to complete input blocks
-rather than the traditional POSIX specified behavior of counting
-input read operations.
+ensures that @samp{count=} counts complete input blocks
+rather than input read operations.
+As an extension to POSIX, @samp{count=0} copies zero blocks
+instead of copying all blocks.
 
 @item status=@var{level}
 @opindex status
@@ -9301,6 +9339,8 @@ An additional line like @samp{1 truncated record} or @samp{10
 truncated records} is output after the @samp{records out} line if
 @samp{conv=block} processing truncated one or more input records.
 
+The @samp{status=} operand is a GNU extension to POSIX.
+
 @item conv=@var{conversion}[,@var{conversion}]@dots{}
 @opindex conv
 Convert the file as specified by the @var{conversion} argument(s).
@@ -9348,6 +9388,8 @@ Remove any trailing spaces in each @samp{cbs}-sized input block,
 and append a newline.
 
 The @samp{block} and @samp{unblock} conversions are mutually exclusive.
+If you use either of these conversions, you should also use the
+@samp{cbs=} operand.
 
 @item lcase
 @opindex lcase@r{, converting to}
@@ -9373,12 +9415,12 @@ Similarly, when the output is a device rather than a file,
 NUL input blocks are not copied, and therefore this conversion
 is most useful with virtual or pre zeroed devices.
 
+The @samp{sparse} conversion is a GNU extension to POSIX.
+
 @item swab
 @opindex swab @r{(byte-swapping)}
 @cindex byte-swapping
-Swap every pair of input bytes.  GNU @command{dd}, unlike others, works
-when an odd number of bytes are read---the last byte is simply copied
-(since there is nothing to swap it with).
+Swap every pair of input bytes.
 
 @item sync
 @opindex sync @r{(padding with ASCII NULs)}
@@ -9403,7 +9445,8 @@ output file itself.
 @cindex creating output file, avoiding
 Do not create the output file; the output file must already exist.
 
-The @samp{excl} and @samp{nocreat} conversions are mutually exclusive.
+The @samp{excl} and @samp{nocreat} conversions are mutually exclusive,
+and are GNU extensions to POSIX.
 
 @item notrunc
 @opindex notrunc
@@ -9421,6 +9464,7 @@ Continue after read errors.
 Synchronize output data just before finishing,
 even if there were write errors.
 This forces a physical write of output data.
+This conversion is a GNU extension to POSIX.
 
 @item fsync
 @opindex fsync
@@ -9428,6 +9472,7 @@ This forces a physical write of output data.
 Synchronize output data and metadata just before finishing,
 even if there were write errors.
 This forces a physical write of output data and metadata.
+This conversion is a GNU extension to POSIX.
 
 @end table
 
@@ -9441,8 +9486,7 @@ argument(s).  (No spaces around any comma(s).)
 Access the output file using the flags specified by the @var{flag}
 argument(s).  (No spaces around any comma(s).)
 
-Here are the flags.  Not every flag is supported on every operating
-system.
+Here are the flags.
 
 @table @samp
 
@@ -9606,7 +9650,8 @@ This flag can be used only with @code{oflag}.
 
 @end table
 
-These flags are not supported on all systems, and @samp{dd} rejects
+These flags are all GNU extensions to POSIX.
+They are not supported on all systems, and @samp{dd} rejects
 attempts to use them when they are not supported.  When reading from
 standard input or writing to standard output, the @samp{nofollow} and
 @samp{noctty} flags should not be specified, and the other flags
@@ -9615,11 +9660,20 @@ affected file descriptors, even after @command{dd} exits.
 
 @end table
 
+The behavior of @command{dd} is unspecified if operands other than
+@samp{conv=}, @samp{iflag=}, @samp{oflag=}, and @samp{status=} are
+specified more than once.
+
 @cindex multipliers after numbers
 The numeric-valued strings above (@var{n} and @var{bytes})
+are unsigned decimal integers that
 can be followed by a multiplier: @samp{b}=512, @samp{c}=1,
 @samp{w}=2, @samp{x@var{m}}=@var{m}, or any of the
 standard block size suffixes like @samp{k}=1024 (@pxref{Block size}).
+These multipliers are GNU extensions to POSIX, except that
+POSIX allows @var{bytes} to be followed by @samp{k}, @samp{b}, and
+@samp{x@var{m}}.
+Block sizes (i.e., specified by @var{bytes} strings) must be nonzero.
 
 Any block size you specify via @samp{bs=}, @samp{ibs=}, @samp{obs=}, @samp{cbs=}
 should not be too large---values larger than a few megabytes
-- 
2.32.0

bug#45648: `dd` seek/skip which way is up?

Reply via email to