Re: Patch to document base64 encoding

Karl O. Pinc Wed, 06 Mar 2019 14:38:07 -0800

On Wed, 6 Mar 2019 19:30:16 +0100 (CET)
Fabien COELHO <[email protected]> wrote:


> "... section 6.8" -> "... Section 6.8" (capital S).

Fixed.

> "The string and the binary encode and decode functions..." sentence
> looks strange to me, especially with the English article that I do
> not really master, so maybe it is ok. I'd have written something more 
> straightforward, eg: "Functions encode and decode support the
> following encodings:",

It is an atypical construction because I want to draw attention that
this is documentation not only for the encode() and decode() in
section 9.4. String Functions and Operators but also for
the encode() and decode in section 9.5. Binary String Functions 
and Operators.  Although I can't think of a better approach
it makes me uncomfortable that documentation written in
one section applies equally to functions in a different section.

Do you think it would be useful to hyperlink the word "binary"
to section 9.5?

The idiomatic phrasing would be "Both the string and the binary
encode and decode functions..." but the word "both" adds
no information.  Shorter is better.

> and also I'd use a direct "Function
> <...>decode</...> ..." rather than "The <function>decode</function>
> function ..." (twice).

The straightforward English would be "Decode accepts...".  The problem
is that this begins the sentence with the name of a function.
This does not work very well when the function name is all lower case,
and can have other problems where clarity is lost depending 
on documentation output formatting.

I don't see a better approach.

> Maybe I'd use the exact same grammatical structure for all 3 cases, 
> starting with "The <>whatever</> encoding converts bla bla bla"
> instead of varying the sentences.

Agreed.  Good idea.  The first paragraph of each term has to 
do with encoding and the second with decoding.  
Uniformity in starting the second paragraphs helps make 
this clear, even though the first paragraphs are not uniform.
With this I am not concerned that the first paragraphs
do not have a common phrasing that's very explicit about
being about encoding.

Adjusted.

> Otherwise, all explanations look both precise and useful to me.

When writing I was slightly concerned about being overly precise;
permanently committing to behavior that might (possibly) be an artifact
of implementation.  E.g., that hex decoding accepts both
upper and lower case A-F characters, what input is ignored
and what raises an error, etc.  But it seems best
to document existing behavior, all of which has existed so long
anyway that changing it would be disruptive.  If anybody cares
they can object.

I wrote the docs by reading the code and did only a little
actual testing to be sure that what I wrote is correct.
I also did not check for regression tests which confirm
the behavior I'm documenting.  (It wouldn't hurt to have
such regression tests, if they don't already exist.
But writing regression tests is more than I want to take on 
with this patch.  Feel free to come up with tests.  :-)

I'm confident that the behavior I documented is how PG behaves
but you should know what I did in case you want further
validation.

Attached: doc_base64_v8.patch

Regards,

Karl <[email protected]>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6765b0d584..e756bf53ba 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1752,6 +1752,9 @@
         <indexterm>
          <primary>decode</primary>
         </indexterm>
+        <indexterm>
+          <primary>base64 encoding</primary>
+        </indexterm>
         <literal><function>decode(<parameter>string</parameter> <type>text</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
@@ -1769,16 +1772,25 @@
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
         <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
         <parameter>format</parameter> <type>text</type>)</function></literal>
        </entry>
        <entry><type>text</type></entry>
        <entry>
         Encode binary data into a textual representation.  Supported
-        formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-        <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-        octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-        doubles backslashes.
+        formats are:
+        <link linkend="base64-encoding"><literal>base64</literal></link>,
+        <link linkend="hex-encoding"><literal>hex</literal></link>,
+        <link linkend="escape-encoding"><literal>escape</literal></link>.
        </entry>
        <entry><literal>encode('123\000\001', 'base64')</literal></entry>
        <entry><literal>MTIzAAE=</literal></entry>
@@ -2365,6 +2377,90 @@
     <function>format</function> treats a NULL as a zero-element array.
    </para>
 
+   <indexterm>
+     <primary>encode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>decode</primary>
+   </indexterm>
+   <indexterm>
+     <primary>base64 encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>hex encoding</primary>
+   </indexterm>
+   <indexterm>
+    <primary>escape encoding</primary>
+   </indexterm>
+
+   <para>
+     The string and the binary <function>encode</function>
+     and <function>decode</function> functions support the following
+     encodings:
+
+     <variablelist>
+       <varlistentry id="base64-encoding">
+         <term>base64</term>
+         <listitem>
+           <para>
+             The <literal>base64</literal> encoding is that
+             of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8";>RFC
+             2045 Section 6.8</ulink>.  As per the RFC, encoded lines are
+             broken at 76 characters.  However instead of the MIME CRLF
+             end-of-line marker, only a newline is used for end-of-line.
+           </para>
+           <para>
+             The <function>decode</function> function ignores carriage-return,
+             newline, space, and tab characters.  Otherwise, an error is
+             raised when <function>decode</function> is supplied invalid
+             base64 data &mdash; including when trailing padding is incorrect.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="hex-encoding">
+         <term>hex</term>
+         <listitem>
+           <para>
+             <literal>hex</literal> represents each 4 bits of data as a single
+             hexadecimal digit, <literal>0</literal>
+             through <literal>f</literal>.  Encoding outputs
+             the <literal>a</literal>-<literal>f</literal> hex digits in lower
+             case.  Because the smallest unit of data is 8 bits there are
+             always an even number of characters returned
+             by <function>encode</function>.
+           </para>
+           <para>
+             The <function>decode</function> function
+             accepts <literal>a</literal>-<literal>f</literal> characters in
+             either upper or lower case.  An error is raised
+             when <function>decode</function> is supplied invalid hex data
+             &mdash; including when given an odd number of characters.
+           </para>
+         </listitem>
+       </varlistentry>
+
+       <varlistentry id="escape-encoding">
+         <term>escape</term>
+         <listitem>
+           <para>
+             <literal>escape</literal> converts zero bytes and high-bit-set
+             bytes to octal sequences
+             (<literal>\</literal><replaceable>nnn</replaceable>) and doubles
+             backslashes.  Encoding always produces 4 characters for each
+             high-bit-set input byte.
+           </para>
+           <para>
+             The <function>decode</function> function accepts fewer than three
+             octal digits after a <literal>\</literal> character.  An error is
+             raised when <function>decode</function> is supplied a
+             single <literal>\</literal> not followed by an octal digit.
+           </para>
+         </listitem>
+       </varlistentry>
+     </variablelist>
+   </para>
+
    <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/>.
@@ -3577,16 +3673,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
         <indexterm>
          <primary>encode</primary>
         </indexterm>
+        <indexterm>
+         <primary>base64 encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>hex encoding</primary>
+        </indexterm>
+        <indexterm>
+         <primary>escape encoding</primary>
+        </indexterm>
        <literal><function>encode(<parameter>data</parameter> <type>bytea</type>,
        <parameter>format</parameter> <type>text</type>)</function></literal>
       </entry>
       <entry><type>text</type></entry>
       <entry>
        Encode binary data into a textual representation.  Supported
-       formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>.
-       <literal>escape</literal> converts zero bytes and high-bit-set bytes to
-       octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and
-       doubles backslashes.
+       formats are:
+       <link linkend="base64-encoding"><literal>base64</literal></link>,
+       <link linkend="hex-encoding"><literal>hex</literal></link>,
+       <link linkend="escape-encoding"><literal>escape</literal></link>.
       </entry>
       <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry>
       <entry><literal>123\000456</literal></entry>

Re: Patch to document base64 encoding

Reply via email to