Hi Fabien, On Sun, 10 Mar 2019 08:15:35 +0100 (CET) Fabien COELHO <coel
> I registered as a reviewer in the CF app. Thanks. What's causing problems here is that the encode and decode functions are listed in both the string functions section and the binary functions section. A related but not-relevant problem is that there are functions listed in the string function section which take binary input. I asked about this on IRC and the brief reply was unflattering to the existing documentation. So I'm going to fix this also. 3 patches attached: doc_base64_part1_v9.patch This moves functions taking bytea and other non-string input into the binary string section, and vice versa. Eliminates duplicate encode() and decode() documentation. Affects: convert(bytea, name, name) convert_from(bytea, name) encode(bytea, text) length(bytea, name) quote_nullable(anytype) to_hex(int or bigint) decode(text, text) Only moves, eliminates duplicates, and adjusts indentation. doc_base64_part2_v9.patch Cleanup wording after moving functions between sections. doc_base64_part3_v9.patch Documents base64, hex, and escape encode() and decode() formats. > >> "The string and the binary encode and decode functions..." sentence > >> looks strange to me, especially with the English article that I do > >> not really master, so maybe it is ok. I'd have written something > >> more straightforward, eg: "Functions encode and decode support the > >> following encodings:", > > > > It is an atypical construction because I want to draw attention > > that this is documentation not only for the encode() and decode() > > in section 9.4. String Functions and Operators but also for the > > encode() and decode in section 9.5. Binary String Functions and > > Operators. Although I can't think of a better approach it makes me > > uncomfortable that documentation written in one section applies > > equally to functions in a different section. > > People coming from the binary doc would have no reason to look at the > string paragraph anyway. > > > Do you think it would be useful to hyperlink the word "binary" > > to section 9.5? > > Hmmm... I think that the link is needed in the other direction. I'm not sure what you mean here or if it's still relevant. > I'd suggest (1) to use a simpler and direct sentence in the string > section, (2) to simplify/shorten the in cell description in the > binary section, and (3) to add an hyperlink from the binary section > which would point to the expanded explanation in the string section. > > > The idiomatic phrasing would be "Both the string and the binary > > encode and decode functions..." but the word "both" adds > > no information. Shorter is better. > > Possibly, although "Both" would insist on the fact that it applies to > the two variants, which was your intention. I think this is no longer relevant. Although I'm not sure what you mean by 3. The format names already hyperlink back to the string docs. > >> and also I'd use a direct "Function > >> <...>decode</...> ..." rather than "The <function>decode</function> > >> function ..." (twice). > > > > The straightforward English would be "Decode accepts...". The > > problem is that this begins the sentence with the name of a > > function. This does not work very well when the function name is > > all lower case, and can have other problems where clarity is lost > > depending on documentation output formatting. > > Yep. > > > I don't see a better approach. > > I suggested "Function <>decode</> ...", which is the kind of thing we > do in academic writing to improve precision, because I thought it > could be better:-) "Function <>decode</> ..." just does not work in English. > >> Maybe I'd use the exact same grammatical structure for all 3 cases, > >> starting with "The <>whatever</> encoding converts bla bla bla" > >> instead of varying the sentences. > > > > Agreed. Good idea. The first paragraph of each term has to > > do with encoding and the second with decoding. > > > > Uniformity in starting the second paragraphs helps make > > this clear, even though the first paragraphs are not uniform. > > With this I am not concerned that the first paragraphs > > do not have a common phrasing that's very explicit about > > being about encoding. > > > > Adjusted. > > Cannot see it fully in the v8 patch: > > - The <literal>base64</literal> encoding is > - <literal>hex</literal> represents > - <literal>escape</literal> converts I did only the decode paras. I guess no reason not to make the first paras uniform as well. Done. I also alphabetized by format name. I hope that 3 patches will make review easier. Regards, Karl <k...@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 03859a78ea..3d748b660f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1692,48 +1692,6 @@ <row> <entry> <indexterm> - <primary>convert</primary> - </indexterm> - <literal><function>convert(<parameter>string</parameter> <type>bytea</type>, - <parameter>src_encoding</parameter> <type>name</type>, - <parameter>dest_encoding</parameter> <type>name</type>)</function></literal> - </entry> - <entry><type>bytea</type></entry> - <entry> - Convert string to <parameter>dest_encoding</parameter>. The - original encoding is specified by - <parameter>src_encoding</parameter>. The - <parameter>string</parameter> must be valid in this encoding. - Conversions can be defined by <command>CREATE CONVERSION</command>. - Also there are some predefined conversions. See <xref - linkend="conversion-names"/> for available conversions. - </entry> - <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry> - <entry><literal>text_in_utf8</literal> represented in Latin-1 - encoding (ISO 8859-1)</entry> - </row> - - <row> - <entry> - <indexterm> - <primary>convert_from</primary> - </indexterm> - <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>, - <parameter>src_encoding</parameter> <type>name</type>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry> - Convert string to the database encoding. The original encoding - is specified by <parameter>src_encoding</parameter>. The - <parameter>string</parameter> must be valid in this encoding. - </entry> - <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry> - <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry> - </row> - - <row> - <entry> - <indexterm> <primary>convert_to</primary> </indexterm> <literal><function>convert_to(<parameter>string</parameter> <type>text</type>, @@ -1765,26 +1723,6 @@ </row> <row> - <entry> - <indexterm> - <primary>encode</primary> - </indexterm> - <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, - <parameter>format</parameter> <type>text</type>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry> - Encode binary data into a textual representation. Supported - formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. - <literal>escape</literal> converts zero bytes and high-bit-set bytes to - octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and - doubles backslashes. - </entry> - <entry><literal>encode('123\000\001', 'base64')</literal></entry> - <entry><literal>MTIzAAE=</literal></entry> - </row> - - <row> <entry id="format"> <indexterm> <primary>format</primary> @@ -1852,19 +1790,6 @@ </row> <row> - <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>, - <parameter>encoding</parameter> <type>name</type> )</function></literal></entry> - <entry><type>int</type></entry> - <entry> - Number of characters in <parameter>string</parameter> in the given - <parameter>encoding</parameter>. The <parameter>string</parameter> - must be valid in this encoding. - </entry> - <entry><literal>length('jose', 'UTF8')</literal></entry> - <entry><literal>4</literal></entry> - </row> - - <row> <entry> <indexterm> <primary>lpad</primary> @@ -2030,18 +1955,6 @@ </row> <row> - <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry> - <entry><type>text</type></entry> - <entry> - Coerce the given value to text and then quote it as a literal; - or, if the argument is null, return <literal>NULL</literal>. - Embedded single-quotes and backslashes are properly doubled. - </entry> - <entry><literal>quote_nullable(42.5)</literal></entry> - <entry><literal>'42.5'</literal></entry> - </row> - - <row> <entry> <indexterm> <primary>regexp_match</primary> @@ -2314,22 +2227,6 @@ <row> <entry> <indexterm> - <primary>to_hex</primary> - </indexterm> - <literal><function>to_hex(<parameter>number</parameter> <type>int</type> - or <type>bigint</type>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal - representation - </entry> - <entry><literal>to_hex(2147483647)</literal></entry> - <entry><literal>7fffffff</literal></entry> - </row> - - <row> - <entry> - <indexterm> <primary>translate</primary> </indexterm> <literal><function>translate(<parameter>string</parameter> <type>text</type>, @@ -3550,47 +3447,72 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); Remove the longest string containing only bytes appearing in <parameter>bytes</parameter> from the start and end of <parameter>string</parameter> - </entry> + </entry> <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry> <entry><literal>trim</literal></entry> </row> - <row> - <entry> + <row> + <entry> <indexterm> - <primary>decode</primary> + <primary>convert</primary> </indexterm> - <literal><function>decode(<parameter>string</parameter> <type>text</type>, - <parameter>format</parameter> <type>text</type>)</function></literal> - </entry> - <entry><type>bytea</type></entry> - <entry> - Decode binary data from textual representation in <parameter>string</parameter>. - Options for <parameter>format</parameter> are same as in <function>encode</function>. - </entry> - <entry><literal>decode('123\000456', 'escape')</literal></entry> - <entry><literal>123\000456</literal></entry> - </row> + <literal><function>convert(<parameter>string</parameter> <type>bytea</type>, + <parameter>src_encoding</parameter> <type>name</type>, + <parameter>dest_encoding</parameter> <type>name</type>)</function></literal> + </entry> + <entry><type>bytea</type></entry> + <entry> + Convert string to <parameter>dest_encoding</parameter>. The + original encoding is specified by + <parameter>src_encoding</parameter>. The + <parameter>string</parameter> must be valid in this encoding. + Conversions can be defined by <command>CREATE CONVERSION</command>. + Also there are some predefined conversions. See <xref + linkend="conversion-names"/> for available conversions. + </entry> + <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry> + <entry><literal>text_in_utf8</literal> represented in Latin-1 + encoding (ISO 8859-1)</entry> + </row> - <row> - <entry> + <row> + <entry> <indexterm> - <primary>encode</primary> + <primary>convert_from</primary> </indexterm> - <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, - <parameter>format</parameter> <type>text</type>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry> - Encode binary data into a textual representation. Supported - formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. - <literal>escape</literal> converts zero bytes and high-bit-set bytes to - octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and - doubles backslashes. - </entry> - <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry> - <entry><literal>123\000456</literal></entry> - </row> + <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>, + <parameter>src_encoding</parameter> <type>name</type>)</function></literal> + </entry> + <entry><type>text</type></entry> + <entry> + Convert string to the database encoding. The original encoding + is specified by <parameter>src_encoding</parameter>. The + <parameter>string</parameter> must be valid in this encoding. + </entry> + <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry> + <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry> + </row> + + <row> + <entry> + <indexterm> + <primary>encode</primary> + </indexterm> + <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, + <parameter>format</parameter> <type>text</type>)</function></literal> + </entry> + <entry><type>text</type></entry> + <entry> + Encode binary data into a textual representation. Supported + formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. + <literal>escape</literal> converts zero bytes and high-bit-set bytes to + octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and + doubles backslashes. + </entry> + <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry> + <entry><literal>123\000456</literal></entry> + </row> <row> <entry> @@ -3622,45 +3544,70 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <entry><literal>109</literal></entry> </row> - <row> - <entry> - <indexterm> - <primary>length</primary> - </indexterm> - <literal><function>length(<parameter>string</parameter>)</function></literal> - </entry> - <entry><type>int</type></entry> - <entry> - Length of binary string - <indexterm> - <primary>binary string</primary> - <secondary>length</secondary> - </indexterm> - <indexterm> - <primary>length</primary> - <secondary sortas="binary string">of a binary string</secondary> - <see>binary strings, length</see> - </indexterm> - </entry> - <entry><literal>length('jo\000se'::bytea)</literal></entry> - <entry><literal>5</literal></entry> - </row> + <row> + <entry> + <indexterm> + <primary>length</primary> + </indexterm> + <literal><function>length(<parameter>string</parameter>)</function></literal> + </entry> + <entry><type>int</type></entry> + <entry> + Length of binary string + <indexterm> + <primary>binary string</primary> + <secondary>length</secondary> + </indexterm> + <indexterm> + <primary>length</primary> + <secondary sortas="binary string">of a binary string</secondary> + <see>binary strings, length</see> + </indexterm> + </entry> + <entry><literal>length('jo\000se'::bytea)</literal></entry> + <entry><literal>5</literal></entry> + </row> - <row> - <entry> - <indexterm> - <primary>md5</primary> - </indexterm> - <literal><function>md5(<parameter>string</parameter>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry> - Calculates the MD5 hash of <parameter>string</parameter>, - returning the result in hexadecimal - </entry> - <entry><literal>md5('Th\000omas'::bytea)</literal></entry> - <entry><literal>8ab2d3c9689aaf18​b4958c334c82d8b1</literal></entry> - </row> + <row> + <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>, + <parameter>encoding</parameter> <type>name</type> )</function></literal></entry> + <entry><type>int</type></entry> + <entry> + Number of characters in <parameter>string</parameter> in the given + <parameter>encoding</parameter>. The <parameter>string</parameter> + must be valid in this encoding. + </entry> + <entry><literal>length('jose', 'UTF8')</literal></entry> + <entry><literal>4</literal></entry> + </row> + + <row> + <entry> + <indexterm> + <primary>md5</primary> + </indexterm> + <literal><function>md5(<parameter>string</parameter>)</function></literal> + </entry> + <entry><type>text</type></entry> + <entry> + Calculates the MD5 hash of <parameter>string</parameter>, + returning the result in hexadecimal + </entry> + <entry><literal>md5('Th\000omas'::bytea)</literal></entry> + <entry><literal>8ab2d3c9689aaf18​b4958c334c82d8b1</literal></entry> + </row> + + <row> + <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry> + <entry><type>text</type></entry> + <entry> + Coerce the given value to text and then quote it as a literal; + or, if the argument is null, return <literal>NULL</literal>. + Embedded single-quotes and backslashes are properly doubled. + </entry> + <entry><literal>quote_nullable(42.5)</literal></entry> + <entry><literal>'42.5'</literal></entry> + </row> <row> <entry> @@ -3753,6 +3700,22 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <entry><literal>sha512('abc')</literal></entry> <entry><literal>\xddaf35a193617abacc417349ae204131​12e6fa4e89a97ea20a9eeee64b55d39a​2192992a274fc1a836ba3c23a3feebbd​454d4423643ce80e2a9ac94fa54ca49f</literal></entry> </row> + + <row> + <entry> + <indexterm> + <primary>to_hex</primary> + </indexterm> + <literal><function>to_hex(<parameter>number</parameter> <type>int</type> + or <type>bigint</type>)</function></literal> + </entry> + <entry><type>text</type></entry> + <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal + representation + </entry> + <entry><literal>to_hex(2147483647)</literal></entry> + <entry><literal>7fffffff</literal></entry> + </row> </tbody> </tgroup> </table>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3d748b660f..22769e5031 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1699,7 +1699,8 @@ </entry> <entry><type>bytea</type></entry> <entry> - Convert string to <parameter>dest_encoding</parameter>. + Convert string to <parameter>dest_encoding</parameter>. See <xref + linkend="conversion-names"/> for available conversions. </entry> <entry><literal>convert_to('some text', 'UTF8')</literal></entry> <entry><literal>some text</literal> represented in the UTF8 encoding</entry> @@ -3284,7 +3285,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <para> This section describes functions and operators for examining and - manipulating values of type <type>bytea</type>. + manipulating values of type <type>bytea</type>, and a few functions + which produce strings from other binary inputs. </para> <para> @@ -3488,7 +3490,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <entry> Convert string to the database encoding. The original encoding is specified by <parameter>src_encoding</parameter>. The - <parameter>string</parameter> must be valid in this encoding. + <parameter>string</parameter> must be valid in this encoding. See + <xref linkend="conversion-names"/> for available conversions. </entry> <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry> <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 22769e5031..14a587c281 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1711,13 +1711,18 @@ <indexterm> <primary>decode</primary> </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> <literal><function>decode(<parameter>string</parameter> <type>text</type>, <parameter>format</parameter> <type>text</type>)</function></literal> </entry> <entry><type>bytea</type></entry> <entry> Decode binary data from textual representation in <parameter>string</parameter>. - Options for <parameter>format</parameter> are same as in <function>encode</function>. + <link linkend="encoding-options">Options + for <parameter>format</parameter></link> are same as + in <function>encode</function>. </entry> <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry> <entry><literal>\x3132330001</literal></entry> @@ -2263,6 +2268,89 @@ <function>format</function> treats a NULL as a zero-element array. </para> + <indexterm> + <primary>encode</primary> + </indexterm> + <indexterm> + <primary>decode</primary> + </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> + <indexterm> + <primary>hex encoding</primary> + </indexterm> + <indexterm> + <primary>escape encoding</primary> + </indexterm> + + <para id="encoding-options"> + The <function>encode</function> and <function>decode</function> functions + support the following encodings: + + <variablelist> + <varlistentry id="base64-encoding"> + <term>base64</term> + <listitem> + <para> + The <literal>base64</literal> encoding is that + of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC + 2045 Section 6.8</ulink>. As per the RFC, encoded lines are + broken at 76 characters. However instead of the MIME CRLF + end-of-line marker, only a newline is used for end-of-line. + </para> + <para> + The <function>decode</function> function ignores carriage-return, + newline, space, and tab characters. Otherwise, an error is + raised when <function>decode</function> is supplied invalid + base64 data — including when trailing padding is incorrect. + </para> + </listitem> + </varlistentry> + + <varlistentry id="escape-encoding"> + <term>escape</term> + <listitem> + <para> + The <literal>escape</literal> encoding converts zero bytes and + high-bit-set bytes to octal sequences + (<literal>\</literal><replaceable>nnn</replaceable>) and doubles + backslashes. Encoding always produces 4 characters for each + high-bit-set input byte. + </para> + <para> + The <function>decode</function> function accepts fewer than three + octal digits after a <literal>\</literal> character. An error is + raised when <function>decode</function> is supplied a + single <literal>\</literal> not followed by an octal digit. + </para> + </listitem> + </varlistentry> + + <varlistentry id="hex-encoding"> + <term>hex</term> + <listitem> + <para> + The <literal>hex</literal> encoding represents each 4 bits of + data as a single hexadecimal digit, <literal>0</literal> + through <literal>f</literal>. Encoding outputs + the <literal>a</literal>-<literal>f</literal> hex digits in lower + case. Because the smallest unit of data is 8 bits there are + always an even number of characters returned + by <function>encode</function>. + </para> + <para> + The <function>decode</function> function + accepts <literal>a</literal>-<literal>f</literal> characters in + either upper or lower case. An error is raised + when <function>decode</function> is supplied invalid hex data + — including when given an odd number of characters. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + <para> See also the aggregate function <function>string_agg</function> in <xref linkend="functions-aggregate"/>. @@ -3499,22 +3587,31 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <row> <entry> - <indexterm> - <primary>encode</primary> - </indexterm> + <indexterm> + <primary>encode</primary> + </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> + <indexterm> + <primary>hex encoding</primary> + </indexterm> + <indexterm> + <primary>escape encoding</primary> + </indexterm> <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, <parameter>format</parameter> <type>text</type>)</function></literal> </entry> <entry><type>text</type></entry> <entry> Encode binary data into a textual representation. Supported - formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. - <literal>escape</literal> converts zero bytes and high-bit-set bytes to - octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and - doubles backslashes. + formats are: + <link linkend="base64-encoding"><literal>base64</literal></link>, + <link linkend="escape-encoding"><literal>escape</literal></link>, + <link linkend="hex-encoding"><literal>hex</literal></link>. </entry> - <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry> - <entry><literal>123\000456</literal></entry> + <entry><literal>encode('123\000\001', 'base64')</literal></entry> + <entry><literal>MTIzAAE=</literal></entry> </row> <row>