Er, ping. Nobody has reviewed the latest patchs. They still apply to master...
I am re-attaching the patches. See descriptions below. On Mon, 11 Mar 2019 15:32:14 -0500 "Karl O. Pinc" <k...@meme.com> wrote: > On Sun, 10 Mar 2019 08:15:35 +0100 (CET) > Fabien COELHO <coel > What's causing problems here is that the encode and decode > functions are listed in both the string functions section > and the binary functions section. A related but not-relevant > problem is that there are functions listed in the string > function section which take binary input. > > I asked about this on IRC and the brief reply was > unflattering to the existing documentation. > > So I'm going to fix this also. 3 patches attached: > > doc_base64_part1_v9.patch > > This moves functions taking bytea and other non-string > input into the binary string section, and vice versa. > Eliminates duplicate encode() and decode() documentation. > > Affects: convert(bytea, name, name) > convert_from(bytea, name) > encode(bytea, text) > length(bytea, name) > quote_nullable(anytype) > to_hex(int or bigint) > decode(text, text) > > Only moves, eliminates duplicates, and adjusts indentation. > > > doc_base64_part2_v9.patch > > Cleanup wording after moving functions between sections. > > > doc_base64_part3_v9.patch > > Documents base64, hex, and escape encode() and decode() > formats. > > > >> "The string and the binary encode and decode functions..." > > >> sentence looks strange to me, especially with the English > > >> article that I do not really master, so maybe it is ok. I'd have > > >> written something more straightforward, eg: "Functions encode > > >> and decode support the following encodings:", > > > > > > It is an atypical construction because I want to draw attention > > > that this is documentation not only for the encode() and decode() > > > in section 9.4. String Functions and Operators but also for the > > > encode() and decode in section 9.5. Binary String Functions and > > > Operators. Although I can't think of a better approach it makes me > > > uncomfortable that documentation written in one section applies > > > equally to functions in a different section. > > > > People coming from the binary doc would have no reason to look at > > the string paragraph anyway. > > > > > Do you think it would be useful to hyperlink the word "binary" > > > to section 9.5? > > > > Hmmm... I think that the link is needed in the other direction. > > I'm not sure what you mean here or if it's still relevant. > > > I'd suggest (1) to use a simpler and direct sentence in the string > > section, (2) to simplify/shorten the in cell description in the > > binary section, and (3) to add an hyperlink from the binary section > > which would point to the expanded explanation in the string section. > > > > > The idiomatic phrasing would be "Both the string and the binary > > > encode and decode functions..." but the word "both" adds > > > no information. Shorter is better. > > > > Possibly, although "Both" would insist on the fact that it applies > > to the two variants, which was your intention. > > I think this is no longer relevant. Although I'm not sure what > you mean by 3. The format names already hyperlink back to the > string docs. > > > >> and also I'd use a direct "Function > > >> <...>decode</...> ..." rather than "The > > >> <function>decode</function> function ..." (twice). > > > > > > The straightforward English would be "Decode accepts...". The > > > problem is that this begins the sentence with the name of a > > > function. This does not work very well when the function name is > > > all lower case, and can have other problems where clarity is lost > > > depending on documentation output formatting. > > > > Yep. > > > > > I don't see a better approach. > > > > I suggested "Function <>decode</> ...", which is the kind of thing > > we do in academic writing to improve precision, because I thought it > > could be better:-) > > "Function <>decode</> ..." just does not work in English. > > > >> Maybe I'd use the exact same grammatical structure for all 3 > > >> cases, starting with "The <>whatever</> encoding converts bla > > >> bla bla" instead of varying the sentences. > > > > > > Agreed. Good idea. The first paragraph of each term has to > > > do with encoding and the second with decoding. > > > > > > > Uniformity in starting the second paragraphs helps make > > > this clear, even though the first paragraphs are not uniform. > > > With this I am not concerned that the first paragraphs > > > do not have a common phrasing that's very explicit about > > > being about encoding. > > > > > > Adjusted. > > > > Cannot see it fully in the v8 patch: > > > > - The <literal>base64</literal> encoding is > > - <literal>hex</literal> represents > > - <literal>escape</literal> converts > > I did only the decode paras. I guess no reason not to make > the first paras uniform as well. Done. > > I also alphabetized by format name. > > I hope that 3 patches will make review easier. Karl <k...@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index da0f305981..5b3bc2496e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1795,48 +1795,6 @@ <row> <entry> <indexterm> - <primary>convert</primary> - </indexterm> - <literal><function>convert(<parameter>string</parameter> <type>bytea</type>, - <parameter>src_encoding</parameter> <type>name</type>, - <parameter>dest_encoding</parameter> <type>name</type>)</function></literal> - </entry> - <entry><type>bytea</type></entry> - <entry> - Convert string to <parameter>dest_encoding</parameter>. The - original encoding is specified by - <parameter>src_encoding</parameter>. The - <parameter>string</parameter> must be valid in this encoding. - Conversions can be defined by <command>CREATE CONVERSION</command>. - Also there are some predefined conversions. See <xref - linkend="conversion-names"/> for available conversions. - </entry> - <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry> - <entry><literal>text_in_utf8</literal> represented in Latin-1 - encoding (ISO 8859-1)</entry> - </row> - - <row> - <entry> - <indexterm> - <primary>convert_from</primary> - </indexterm> - <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>, - <parameter>src_encoding</parameter> <type>name</type>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry> - Convert string to the database encoding. The original encoding - is specified by <parameter>src_encoding</parameter>. The - <parameter>string</parameter> must be valid in this encoding. - </entry> - <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry> - <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry> - </row> - - <row> - <entry> - <indexterm> <primary>convert_to</primary> </indexterm> <literal><function>convert_to(<parameter>string</parameter> <type>text</type>, @@ -1868,26 +1826,6 @@ </row> <row> - <entry> - <indexterm> - <primary>encode</primary> - </indexterm> - <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, - <parameter>format</parameter> <type>text</type>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry> - Encode binary data into a textual representation. Supported - formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. - <literal>escape</literal> converts zero bytes and high-bit-set bytes to - octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and - doubles backslashes. - </entry> - <entry><literal>encode('123\000\001', 'base64')</literal></entry> - <entry><literal>MTIzAAE=</literal></entry> - </row> - - <row> <entry id="format"> <indexterm> <primary>format</primary> @@ -1955,19 +1893,6 @@ </row> <row> - <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>, - <parameter>encoding</parameter> <type>name</type> )</function></literal></entry> - <entry><type>int</type></entry> - <entry> - Number of characters in <parameter>string</parameter> in the given - <parameter>encoding</parameter>. The <parameter>string</parameter> - must be valid in this encoding. - </entry> - <entry><literal>length('jose', 'UTF8')</literal></entry> - <entry><literal>4</literal></entry> - </row> - - <row> <entry> <indexterm> <primary>lpad</primary> @@ -2133,18 +2058,6 @@ </row> <row> - <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry> - <entry><type>text</type></entry> - <entry> - Coerce the given value to text and then quote it as a literal; - or, if the argument is null, return <literal>NULL</literal>. - Embedded single-quotes and backslashes are properly doubled. - </entry> - <entry><literal>quote_nullable(42.5)</literal></entry> - <entry><literal>'42.5'</literal></entry> - </row> - - <row> <entry> <indexterm> <primary>regexp_match</primary> @@ -2417,22 +2330,6 @@ <row> <entry> <indexterm> - <primary>to_hex</primary> - </indexterm> - <literal><function>to_hex(<parameter>number</parameter> <type>int</type> - or <type>bigint</type>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal - representation - </entry> - <entry><literal>to_hex(2147483647)</literal></entry> - <entry><literal>7fffffff</literal></entry> - </row> - - <row> - <entry> - <indexterm> <primary>translate</primary> </indexterm> <literal><function>translate(<parameter>string</parameter> <type>text</type>, @@ -3653,47 +3550,72 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); Remove the longest string containing only bytes appearing in <parameter>bytes</parameter> from the start and end of <parameter>string</parameter> - </entry> + </entry> <entry><literal>btrim('\000trim\001'::bytea, '\000\001'::bytea)</literal></entry> <entry><literal>trim</literal></entry> </row> - <row> - <entry> + <row> + <entry> <indexterm> - <primary>decode</primary> + <primary>convert</primary> </indexterm> - <literal><function>decode(<parameter>string</parameter> <type>text</type>, - <parameter>format</parameter> <type>text</type>)</function></literal> - </entry> - <entry><type>bytea</type></entry> - <entry> - Decode binary data from textual representation in <parameter>string</parameter>. - Options for <parameter>format</parameter> are same as in <function>encode</function>. - </entry> - <entry><literal>decode('123\000456', 'escape')</literal></entry> - <entry><literal>123\000456</literal></entry> - </row> + <literal><function>convert(<parameter>string</parameter> <type>bytea</type>, + <parameter>src_encoding</parameter> <type>name</type>, + <parameter>dest_encoding</parameter> <type>name</type>)</function></literal> + </entry> + <entry><type>bytea</type></entry> + <entry> + Convert string to <parameter>dest_encoding</parameter>. The + original encoding is specified by + <parameter>src_encoding</parameter>. The + <parameter>string</parameter> must be valid in this encoding. + Conversions can be defined by <command>CREATE CONVERSION</command>. + Also there are some predefined conversions. See <xref + linkend="conversion-names"/> for available conversions. + </entry> + <entry><literal>convert('text_in_utf8', 'UTF8', 'LATIN1')</literal></entry> + <entry><literal>text_in_utf8</literal> represented in Latin-1 + encoding (ISO 8859-1)</entry> + </row> - <row> - <entry> + <row> + <entry> <indexterm> - <primary>encode</primary> + <primary>convert_from</primary> </indexterm> - <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, - <parameter>format</parameter> <type>text</type>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry> - Encode binary data into a textual representation. Supported - formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. - <literal>escape</literal> converts zero bytes and high-bit-set bytes to - octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and - doubles backslashes. - </entry> - <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry> - <entry><literal>123\000456</literal></entry> - </row> + <literal><function>convert_from(<parameter>string</parameter> <type>bytea</type>, + <parameter>src_encoding</parameter> <type>name</type>)</function></literal> + </entry> + <entry><type>text</type></entry> + <entry> + Convert string to the database encoding. The original encoding + is specified by <parameter>src_encoding</parameter>. The + <parameter>string</parameter> must be valid in this encoding. + </entry> + <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry> + <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry> + </row> + + <row> + <entry> + <indexterm> + <primary>encode</primary> + </indexterm> + <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, + <parameter>format</parameter> <type>text</type>)</function></literal> + </entry> + <entry><type>text</type></entry> + <entry> + Encode binary data into a textual representation. Supported + formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. + <literal>escape</literal> converts zero bytes and high-bit-set bytes to + octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and + doubles backslashes. + </entry> + <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry> + <entry><literal>123\000456</literal></entry> + </row> <row> <entry> @@ -3725,45 +3647,70 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <entry><literal>109</literal></entry> </row> - <row> - <entry> - <indexterm> - <primary>length</primary> - </indexterm> - <literal><function>length(<parameter>string</parameter>)</function></literal> - </entry> - <entry><type>int</type></entry> - <entry> - Length of binary string - <indexterm> - <primary>binary string</primary> - <secondary>length</secondary> - </indexterm> - <indexterm> - <primary>length</primary> - <secondary sortas="binary string">of a binary string</secondary> - <see>binary strings, length</see> - </indexterm> - </entry> - <entry><literal>length('jo\000se'::bytea)</literal></entry> - <entry><literal>5</literal></entry> - </row> + <row> + <entry> + <indexterm> + <primary>length</primary> + </indexterm> + <literal><function>length(<parameter>string</parameter>)</function></literal> + </entry> + <entry><type>int</type></entry> + <entry> + Length of binary string + <indexterm> + <primary>binary string</primary> + <secondary>length</secondary> + </indexterm> + <indexterm> + <primary>length</primary> + <secondary sortas="binary string">of a binary string</secondary> + <see>binary strings, length</see> + </indexterm> + </entry> + <entry><literal>length('jo\000se'::bytea)</literal></entry> + <entry><literal>5</literal></entry> + </row> - <row> - <entry> - <indexterm> - <primary>md5</primary> - </indexterm> - <literal><function>md5(<parameter>string</parameter>)</function></literal> - </entry> - <entry><type>text</type></entry> - <entry> - Calculates the MD5 hash of <parameter>string</parameter>, - returning the result in hexadecimal - </entry> - <entry><literal>md5('Th\000omas'::bytea)</literal></entry> - <entry><literal>8ab2d3c9689aaf18​b4958c334c82d8b1</literal></entry> - </row> + <row> + <entry><literal><function>length(<parameter>string</parameter> <type>bytea</type>, + <parameter>encoding</parameter> <type>name</type> )</function></literal></entry> + <entry><type>int</type></entry> + <entry> + Number of characters in <parameter>string</parameter> in the given + <parameter>encoding</parameter>. The <parameter>string</parameter> + must be valid in this encoding. + </entry> + <entry><literal>length('jose', 'UTF8')</literal></entry> + <entry><literal>4</literal></entry> + </row> + + <row> + <entry> + <indexterm> + <primary>md5</primary> + </indexterm> + <literal><function>md5(<parameter>string</parameter>)</function></literal> + </entry> + <entry><type>text</type></entry> + <entry> + Calculates the MD5 hash of <parameter>string</parameter>, + returning the result in hexadecimal + </entry> + <entry><literal>md5('Th\000omas'::bytea)</literal></entry> + <entry><literal>8ab2d3c9689aaf18​b4958c334c82d8b1</literal></entry> + </row> + + <row> + <entry><literal><function>quote_nullable(<parameter>value</parameter> <type>anyelement</type>)</function></literal></entry> + <entry><type>text</type></entry> + <entry> + Coerce the given value to text and then quote it as a literal; + or, if the argument is null, return <literal>NULL</literal>. + Embedded single-quotes and backslashes are properly doubled. + </entry> + <entry><literal>quote_nullable(42.5)</literal></entry> + <entry><literal>'42.5'</literal></entry> + </row> <row> <entry> @@ -3856,6 +3803,22 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <entry><literal>sha512('abc')</literal></entry> <entry><literal>\xddaf35a193617abacc417349ae204131​12e6fa4e89a97ea20a9eeee64b55d39a​2192992a274fc1a836ba3c23a3feebbd​454d4423643ce80e2a9ac94fa54ca49f</literal></entry> </row> + + <row> + <entry> + <indexterm> + <primary>to_hex</primary> + </indexterm> + <literal><function>to_hex(<parameter>number</parameter> <type>int</type> + or <type>bigint</type>)</function></literal> + </entry> + <entry><type>text</type></entry> + <entry>Convert <parameter>number</parameter> to its equivalent hexadecimal + representation + </entry> + <entry><literal>to_hex(2147483647)</literal></entry> + <entry><literal>7fffffff</literal></entry> + </row> </tbody> </tgroup> </table>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5b3bc2496e..af7b8284dc 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1802,7 +1802,8 @@ </entry> <entry><type>bytea</type></entry> <entry> - Convert string to <parameter>dest_encoding</parameter>. + Convert string to <parameter>dest_encoding</parameter>. See <xref + linkend="conversion-names"/> for available conversions. </entry> <entry><literal>convert_to('some text', 'UTF8')</literal></entry> <entry><literal>some text</literal> represented in the UTF8 encoding</entry> @@ -3387,7 +3388,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <para> This section describes functions and operators for examining and - manipulating values of type <type>bytea</type>. + manipulating values of type <type>bytea</type>, and a few functions + which produce strings from other binary inputs. </para> <para> @@ -3591,7 +3593,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <entry> Convert string to the database encoding. The original encoding is specified by <parameter>src_encoding</parameter>. The - <parameter>string</parameter> must be valid in this encoding. + <parameter>string</parameter> must be valid in this encoding. See + <xref linkend="conversion-names"/> for available conversions. </entry> <entry><literal>convert_from('text_in_utf8', 'UTF8')</literal></entry> <entry><literal>text_in_utf8</literal> represented in the current database encoding</entry>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index af7b8284dc..f81539e341 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1814,13 +1814,18 @@ <indexterm> <primary>decode</primary> </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> <literal><function>decode(<parameter>string</parameter> <type>text</type>, <parameter>format</parameter> <type>text</type>)</function></literal> </entry> <entry><type>bytea</type></entry> <entry> Decode binary data from textual representation in <parameter>string</parameter>. - Options for <parameter>format</parameter> are same as in <function>encode</function>. + <link linkend="encoding-options">Options + for <parameter>format</parameter></link> are same as + in <function>encode</function>. </entry> <entry><literal>decode('MTIzAAE=', 'base64')</literal></entry> <entry><literal>\x3132330001</literal></entry> @@ -2366,6 +2371,89 @@ <function>format</function> treats a NULL as a zero-element array. </para> + <indexterm> + <primary>encode</primary> + </indexterm> + <indexterm> + <primary>decode</primary> + </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> + <indexterm> + <primary>hex encoding</primary> + </indexterm> + <indexterm> + <primary>escape encoding</primary> + </indexterm> + + <para id="encoding-options"> + The <function>encode</function> and <function>decode</function> functions + support the following encodings: + + <variablelist> + <varlistentry id="base64-encoding"> + <term>base64</term> + <listitem> + <para> + The <literal>base64</literal> encoding is that + of <ulink url="https://tools.ietf.org/html/rfc2045#section-6.8">RFC + 2045 Section 6.8</ulink>. As per the RFC, encoded lines are + broken at 76 characters. However instead of the MIME CRLF + end-of-line marker, only a newline is used for end-of-line. + </para> + <para> + The <function>decode</function> function ignores carriage-return, + newline, space, and tab characters. Otherwise, an error is + raised when <function>decode</function> is supplied invalid + base64 data — including when trailing padding is incorrect. + </para> + </listitem> + </varlistentry> + + <varlistentry id="escape-encoding"> + <term>escape</term> + <listitem> + <para> + The <literal>escape</literal> encoding converts zero bytes and + high-bit-set bytes to octal sequences + (<literal>\</literal><replaceable>nnn</replaceable>) and doubles + backslashes. Encoding always produces 4 characters for each + high-bit-set input byte. + </para> + <para> + The <function>decode</function> function accepts fewer than three + octal digits after a <literal>\</literal> character. An error is + raised when <function>decode</function> is supplied a + single <literal>\</literal> not followed by an octal digit. + </para> + </listitem> + </varlistentry> + + <varlistentry id="hex-encoding"> + <term>hex</term> + <listitem> + <para> + The <literal>hex</literal> encoding represents each 4 bits of + data as a single hexadecimal digit, <literal>0</literal> + through <literal>f</literal>. Encoding outputs + the <literal>a</literal>-<literal>f</literal> hex digits in lower + case. Because the smallest unit of data is 8 bits there are + always an even number of characters returned + by <function>encode</function>. + </para> + <para> + The <function>decode</function> function + accepts <literal>a</literal>-<literal>f</literal> characters in + either upper or lower case. An error is raised + when <function>decode</function> is supplied invalid hex data + — including when given an odd number of characters. + </para> + </listitem> + </varlistentry> + </variablelist> + </para> + <para> See also the aggregate function <function>string_agg</function> in <xref linkend="functions-aggregate"/>. @@ -3602,22 +3690,31 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); <row> <entry> - <indexterm> - <primary>encode</primary> - </indexterm> + <indexterm> + <primary>encode</primary> + </indexterm> + <indexterm> + <primary>base64 encoding</primary> + </indexterm> + <indexterm> + <primary>hex encoding</primary> + </indexterm> + <indexterm> + <primary>escape encoding</primary> + </indexterm> <literal><function>encode(<parameter>data</parameter> <type>bytea</type>, <parameter>format</parameter> <type>text</type>)</function></literal> </entry> <entry><type>text</type></entry> <entry> Encode binary data into a textual representation. Supported - formats are: <literal>base64</literal>, <literal>hex</literal>, <literal>escape</literal>. - <literal>escape</literal> converts zero bytes and high-bit-set bytes to - octal sequences (<literal>\</literal><replaceable>nnn</replaceable>) and - doubles backslashes. + formats are: + <link linkend="base64-encoding"><literal>base64</literal></link>, + <link linkend="escape-encoding"><literal>escape</literal></link>, + <link linkend="hex-encoding"><literal>hex</literal></link>. </entry> - <entry><literal>encode('123\000456'::bytea, 'escape')</literal></entry> - <entry><literal>123\000456</literal></entry> + <entry><literal>encode('123\000\001', 'base64')</literal></entry> + <entry><literal>MTIzAAE=</literal></entry> </row> <row>