On Wed, 6 Mar 2019 19:30:16 +0100 (CET) Fabien COELHO <coe...@cri.ensmp.fr> wrote:
> "... section 6.8" -> "... Section 6.8" (capital S). Fixed. > "The string and the binary encode and decode functions..." sentence > looks strange to me, especially with the English article that I do > not really master, so maybe it is ok. I'd have written something more > straightforward, eg: "Functions encode and decode support the > following encodings:", It is an atypical construction because I want to draw attention that this is documentation not only for the encode() and decode() in section 9.4. String Functions and Operators but also for the encode() and decode in section 9.5. Binary String Functions and Operators. Although I can't think of a better approach it makes me uncomfortable that documentation written in one section applies equally to functions in a different section. Do you think it would be useful to hyperlink the word "binary" to section 9.5? The idiomatic phrasing would be "Both the string and the binary encode and decode functions..." but the word "both" adds no information. Shorter is better. > and also I'd use a direct "Function > <...>decode</...> ..." rather than "The <function>decode</function> > function ..." (twice). The straightforward English would be "Decode accepts...". The problem is that this begins the sentence with the name of a function. This does not work very well when the function name is all lower case, and can have other problems where clarity is lost depending on documentation output formatting. I don't see a better approach. > Maybe I'd use the exact same grammatical structure for all 3 cases, > starting with "The <>whatever</> encoding converts bla bla bla" > instead of varying the sentences. Agreed. Good idea. The first paragraph of each term has to do with encoding and the second with decoding. Uniformity in starting the second paragraphs helps make this clear, even though the first paragraphs are not uniform. With this I am not concerned that the first paragraphs do not have a common phrasing that's very explicit about being about encoding. Adjusted. > Otherwise, all explanations look both precise and useful to me. When writing I was slightly concerned about being overly precise; permanently committing to behavior that might (possibly) be an artifact of implementation. E.g., that hex decoding accepts both upper and lower case A-F characters, what input is ignored and what raises an error, etc. But it seems best to document existing behavior, all of which has existed so long anyway that changing it would be disruptive. If anybody cares they can object. I wrote the docs by reading the code and did only a little actual testing to be sure that what I wrote is correct. I also did not check for regression tests which confirm the behavior I'm documenting. (It wouldn't hurt to have such regression tests, if they don't already exist. But writing regression tests is more than I want to take on with this patch. Feel free to come up with tests. :-) I'm confident that the behavior I documented is how PG behaves but you should know what I did in case you want further validation. Attached: doc_base64_v8.patch Regards, Karl <k...@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 6765b0d584..e756bf53ba 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1752,6 +1752,9 @@ <indexterm> <primary>decode</primary> </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> <literal><function>decode(<parameter>string</parameter> <type>text</type>, <parameter>format</parameter> <type>text</type>)</function></literal> </entry> @@ -1769,16 +1772,25 @@ <indexterm> <primary>encode</primary> </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> + <indexterm> + <primary>hex encoding</primary> + </indexterm> + <indexterm> + <primary>escape encoding</primary> + </indexterm> <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, <parameter>format</parameter> <type>text</type>)</function></literal> </entry> <entry><type>text</type></entry> <entry> Encode binary data into a textual representation. Supported - formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. - <literal>escape</literal> converts zero bytes and high-bit-set bytes to - octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and - doubles backslashes. + formats are: + <link linkend="base64-encoding"><literal>base64</literal></link>, + <link linkend="hex-encoding"><literal>hex</literal></link>, + <link linkend="escape-encoding"><literal>escape</literal></link>. </entry> <entry><literal>encode('123\000\001', 'base64')</literal></entry> <entry><literal>MTIzAAE=</literal></entry> @@ -2365,6 +2377,90 @@ <function>format</function> treats a NULL as a zero-element array. </para> + <indexterm> + <primary>encode</primary> + </indexterm> + <indexterm> + <primary>decode</primary> + </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> + <indexterm> + <primary>hex encoding</primary> + </indexterm> + <indexterm> + <primary>escape encoding</primary> + </indexterm> + + <para> + The string and the binary <function>encode</function> + and <function>decode</function> functions support the following + encodings: + + <variablelist> + <varlistentry id="base64-encoding"> + <term>base64</term> + <listitem> + <para> + The <literal>base64</literal> encoding is that + of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC + 2045 Section 6.8</ulink>. As per the RFC, encoded lines are + broken at 76 characters. However instead of the MIME CRLF + end-of-line marker, only a newline is used for end-of-line. + </para> + <para> + The <function>decode</function> function ignores carriage-return, + newline, space, and tab characters. Otherwise, an error is + raised when <function>decode</function> is supplied invalid + base64 data — including when trailing padding is incorrect. + </para> + </listitem> + </varlistentry> + + <varlistentry id="hex-encoding"> + <term>hex</term> + <listitem> + <para> + <literal>hex</literal> represents each 4 bits of data as a single + hexadecimal digit, <literal>0</literal> + through <literal>f</literal>. Encoding outputs + the <literal>a</literal>-<literal>f</literal> hex digits in lower + case. Because the smallest unit of data is 8 bits there are + always an even number of characters returned + by <function>encode</function>. + </para> + <para> + The <function>decode</function> function + accepts <literal>a</literal>-<literal>f</literal> characters in + either upper or lower case. An error is raised + when <function>decode</function> is supplied invalid hex data + — including when given an odd number of characters. + </para> + </listitem> + </varlistentry> + + <varlistentry id="escape-encoding"> + <term>escape</term> + <listitem> + <para> + <literal>escape</literal> converts zero bytes and high-bit-set + bytes to octal sequences + (<literal>\</literal><replaceable>nnn</replaceable>) and doubles + backslashes. Encoding always produces 4 characters for each + high-bit-set input byte. + </para> + <para> + The <function>decode</function> function accepts fewer than three + octal digits after a <literal>\</literal> character. An error is + raised when <function>decode</function> is supplied a + single <literal>\</literal> not followed by an octal digit. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + <para> See also the aggregate function <function>string_agg</function> in <xref linkend="functions-aggregate"/>. @@ -3577,16 +3673,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <indexterm> <primary>encode</primary> </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> + <indexterm> + <primary>hex encoding</primary> + </indexterm> + <indexterm> + <primary>escape encoding</primary> + </indexterm> <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, <parameter>format</parameter> <type>text</type>)</function></literal> </entry> <entry><type>text</type></entry> <entry> Encode binary data into a textual representation. Supported - formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. - <literal>escape</literal> converts zero bytes and high-bit-set bytes to - octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and - doubles backslashes. + formats are: + <link linkend="base64-encoding"><literal>base64</literal></link>, + <link linkend="hex-encoding"><literal>hex</literal></link>, + <link linkend="escape-encoding"><literal>escape</literal></link>. </entry> <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry> <entry><literal>123\000456</literal></entry>