Dear Thomas, Thank you for your proofreading!
Please find the updated patch attached. It also contains the missing escaping. On Mon, Jul 8, 2019 at 10:39 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > On Wed, Apr 17, 2019 at 5:29 AM Dmitry Belyavsky <beld...@gmail.com> > wrote: > > I've applied your patch. > > From my point of view, there is no major difference between case and > chain if here. > > Neither case nor ifs allow extracting the common code to separate > function - just because there seem to be no identical pieces of code. > > Hi Dmitry, > > The documentation doesn't build[1], due to invalid XML. Since I'm > here, here is some proof-reading of the English in the documentation: > > <para> > - A <firstterm>label</firstterm> is a sequence of alphanumeric characters > - and underscores (for example, in C locale the characters > - <literal>A-Za-z0-9_</literal> are allowed). Labels must be less > than 256 bytes > - long. > + A <firstterm>label</firstterm> is a sequence of characters. Labels > must be > + less than 256 symbols long. Label may contain any character > supported by Postgres > > "fewer than 256 characters in length", and > "<productname>PostgreSQL</productname>" > > + except <literal>\0</literal>. If label contains spaces, dots, > lquery modifiers, > > "spaces, dots or lquery modifiers," > > + they may be <firstterm>escaped</firstterm>. Escaping can be done > either by preceeding > + backslash (<literal>\\</literal>) symbol, or by wrapping the label > in whole in double > + quotes (<literal>"</literal>). Initial and final unescaped > whitespace is stripped. > > "Escaping can be done with either a preceding backslash [...] or by > wrapping the whole label in double quotes [...]." > > </para> > > + During converting text into internal representations, wrapping > double quotes > > "During conversion to internal representation, " > > + and escaping backslashes are removed. During converting internal > + representations into text, if the label does not contain any special > > "During conversion from internal representation to text, " > > + symbols, it is printed as is. Otherwise, it is wrapped in quotes and, > if > + there are internal quotes, they are escaped with backslash. The > list of special > > "escaped with backslashes." > > + <para> > + Examples: <literal>42</literal>, <literal>"\\42"</literal>, > + <literal>\\4\\2</literal>, <literal> 42 </literal> and <literal> "42" > + </literal> will have the similar internal representation and, being > > "will all have the same internal representation and," > > + converted from internal representation, will become > <literal>42</literal>. > + Literal <literal>abc def</literal> will turn into <literal>"abc > + def"</literal>. > </para> > > [1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/555571856 > > -- > Thomas Munro > https://enterprisedb.com > -- SY, Dmitry Belyavsky
diff --git a/contrib/ltree/expected/ltree.out b/contrib/ltree/expected/ltree.out index 8226930905..5f45726229 100644 --- a/contrib/ltree/expected/ltree.out +++ b/contrib/ltree/expected/ltree.out @@ -1,4 +1,5 @@ CREATE EXTENSION ltree; +SET standard_conforming_strings=on; -- Check whether any of our opclasses fail amvalidate SELECT amname, opcname FROM pg_opclass opc LEFT JOIN pg_am am ON am.oid = opcmethod @@ -7679,3 +7680,1587 @@ SELECT count(*) FROM _ltreetest WHERE t ? '{23.*.1,23.*.2}' ; 15 (1 row) +-- Extended syntax, escaping, quoting etc +-- success +SELECT E'\\.'::ltree; + ltree +------- + "." +(1 row) + +SELECT E'\\ '::ltree; + ltree +------- + " " +(1 row) + +SELECT E'\\\\'::ltree; + ltree +------- + "\" +(1 row) + +SELECT E'\\a'::ltree; + ltree +------- + a +(1 row) + +SELECT E'\\n'::ltree; + ltree +------- + n +(1 row) + +SELECT E'x\\\\'::ltree; + ltree +------- + "x\" +(1 row) + +SELECT E'x\\ '::ltree; + ltree +------- + "x " +(1 row) + +SELECT E'x\\.'::ltree; + ltree +------- + "x." +(1 row) + +SELECT E'x\\a'::ltree; + ltree +------- + xa +(1 row) + +SELECT E'x\\n'::ltree; + ltree +------- + xn +(1 row) + +SELECT 'a b.с d'::ltree; + ltree +------------- + "a b"."с d" +(1 row) + +SELECT ' e . f '::ltree; + ltree +------- + e.f +(1 row) + +SELECT ' '::ltree; + ltree +------- + +(1 row) + +SELECT E'\\ g . h\\ '::ltree; + ltree +----------- + " g"."h " +(1 row) + +SELECT E'\\ g'::ltree; + ltree +------- + " g" +(1 row) + +SELECT E' h\\ '::ltree; + ltree +------- + "h " +(1 row) + +SELECT '" g "." h "'::ltree; + ltree +-------------- + " g "." h " +(1 row) + +SELECT '" g " '::ltree; + ltree +-------- + " g " +(1 row) + +SELECT '" g " ." h " '::ltree; + ltree +-------------- + " g "." h " +(1 row) + +SELECT nlevel(E'Bottom\\.Test'::ltree); + nlevel +-------- + 1 +(1 row) + +SELECT subpath(E'Bottom\\.'::ltree, 0, 1); + subpath +----------- + "Bottom." +(1 row) + +SELECT subpath(E'a\\.b', 0, 1); + subpath +--------- + "a.b" +(1 row) + +SELECT subpath(E'a\\..b', 1, 1); + subpath +--------- + b +(1 row) + +SELECT subpath(E'a\\..\\b', 1, 1); + subpath +--------- + b +(1 row) + +SELECT subpath(E'a b.с d'::ltree, 1, 1); + subpath +--------- + "с d" +(1 row) + +SELECT( +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z')::ltree; + ltreezzzzz +(1 row) + +SELECT(' ' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\a\b\c\d\e ')::ltree; + ltreeabcde +(1 row) + +SELECT 'abc\|d'::lquery; + lquery +--------- + "abc|d" +(1 row) + +SELECT 'abc\|d'::ltree ~ 'abc\|d'::lquery; + ?column? +---------- + t +(1 row) + +SELECT 'abc|d'::ltree ~ 'abc*'::lquery; --true + ?column? +---------- + t +(1 row) + +SELECT 'abc|d'::ltree ~ 'abc\*'::lquery; --false + ?column? +---------- + f +(1 row) + +SELECT E'abc|\\.'::ltree ~ 'abc\|*'::lquery; --true + ?column? +---------- + t +(1 row) + +SELECT E'"\\""'::ltree; + ltree +------- + "\"" +(1 row) + +SELECT '\"'::ltree; + ltree +------- + "\"" +(1 row) + +SELECT E'\\"'::ltree; + ltree +------- + "\"" +(1 row) + +SELECT 'a\"b'::ltree; + ltree +-------- + "a\"b" +(1 row) + +SELECT '"ab"'::ltree; + ltree +------- + ab +(1 row) + +SELECT '"."'::ltree; + ltree +------- + "." +(1 row) + +SELECT E'".\\""'::ltree; + ltree +------- + ".\"" +(1 row) + +SELECT( +'"01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z"')::ltree; + ltreezzzzz +(1 row) + +SELECT E'"\\""'::lquery; + lquery +-------- + "\"" +(1 row) + +SELECT '\"'::lquery; + lquery +-------- + "\"" +(1 row) + +SELECT E'\\"'::lquery; + lquery +-------- + "\"" +(1 row) + +SELECT 'a\"b'::lquery; + lquery +-------- + "a\"b" +(1 row) + +SELECT '"ab"'::lquery; + lquery +-------- + ab +(1 row) + +SELECT '"."'::lquery; + lquery +-------- + "." +(1 row) + +SELECT E'".\\""'::lquery; + lquery +-------- + ".\"" +(1 row) + +SELECT( +'"01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z"')::lquery; + lqueryzzzzz +(1 row) + +SELECT ' e . f '::lquery; + lquery +-------- + e.f +(1 row) + +SELECT ' e | f '::lquery; + lquery +-------- + e|f +(1 row) + +SELECT E'\\ g . h\\ '::lquery; + lquery +----------- + " g"."h " +(1 row) + +SELECT E'\\ g'::lquery; + lquery +-------- + " g" +(1 row) + +SELECT E' h\\ '::lquery; + lquery +-------- + "h " +(1 row) + +SELECT E'"\\ g"'::lquery; + lquery +-------- + " g" +(1 row) + +SELECT E' "h\\ "'::lquery; + lquery +-------- + "h " +(1 row) + +SELECT '" g "." h "'::lquery; + lquery +-------------- + " g "." h " +(1 row) + +SELECT E'\\ g | h\\ '::lquery; + lquery +----------- + " g"|"h " +(1 row) + +SELECT '" g "|" h "'::lquery; + lquery +-------------- + " g "|" h " +(1 row) + +SELECT '" g " '::lquery; + lquery +-------- + " g " +(1 row) + +SELECT '" g " ." h " '::lquery; + lquery +-------------- + " g "." h " +(1 row) + +SELECT '" g " | " h " '::lquery; + lquery +-------------- + " g "|" h " +(1 row) + +SELECT(' ' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\a\b\c\d\e ')::lquery; + lquery +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- + 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789abcde +(1 row) + +SELECT E'"a\\"b"'::lquery; + lquery +-------- + "a\"b" +(1 row) + +SELECT '"a!b"'::lquery; + lquery +-------- + "a!b" +(1 row) + +SELECT '"a%b"'::lquery; + lquery +-------- + "a%b" +(1 row) + +SELECT '"a*b"'::lquery; + lquery +-------- + "a*b" +(1 row) + +SELECT '"a@b"'::lquery; + lquery +-------- + "a@b" +(1 row) + +SELECT '"a{b"'::lquery; + lquery +-------- + "a{b" +(1 row) + +SELECT '"a}b"'::lquery; + lquery +-------- + "a}b" +(1 row) + +SELECT '"a|b"'::lquery; + lquery +-------- + "a|b" +(1 row) + +SELECT E'a\\"b'::lquery; + lquery +-------- + "a\"b" +(1 row) + +SELECT E'a\\!b'::lquery; + lquery +-------- + "a!b" +(1 row) + +SELECT E'a\\%b'::lquery; + lquery +-------- + "a%b" +(1 row) + +SELECT E'a\\*b'::lquery; + lquery +-------- + "a*b" +(1 row) + +SELECT E'a\\@b'::lquery; + lquery +-------- + "a@b" +(1 row) + +SELECT E'a\\{b'::lquery; + lquery +-------- + "a{b" +(1 row) + +SELECT E'a\\}b'::lquery; + lquery +-------- + "a}b" +(1 row) + +SELECT E'a\\|b'::lquery; + lquery +-------- + "a|b" +(1 row) + +SELECT '!"!b"'::lquery; + lquery +-------- + !"!b" +(1 row) + +SELECT '!"%b"'::lquery; + lquery +-------- + !"%b" +(1 row) + +SELECT '!"*b"'::lquery; + lquery +-------- + !"*b" +(1 row) + +SELECT '!"@b"'::lquery; + lquery +-------- + !"@b" +(1 row) + +SELECT '!"{b"'::lquery; + lquery +-------- + !"{b" +(1 row) + +SELECT '!"}b"'::lquery; + lquery +-------- + !"}b" +(1 row) + +SELECT E'!\\!b'::lquery; + lquery +-------- + !"!b" +(1 row) + +SELECT E'!\\%b'::lquery; + lquery +-------- + !"%b" +(1 row) + +SELECT E'!\\*b'::lquery; + lquery +-------- + !"*b" +(1 row) + +SELECT E'!\\@b'::lquery; + lquery +-------- + !"@b" +(1 row) + +SELECT E'!\\{b'::lquery; + lquery +-------- + !"{b" +(1 row) + +SELECT E'!\\}b'::lquery; + lquery +-------- + !"}b" +(1 row) + +SELECT '"1"'::lquery; + lquery +-------- + 1 +(1 row) + +SELECT '"2.*"'::lquery; + lquery +-------- + "2.*" +(1 row) + +SELECT '!"1"'::lquery; + lquery +-------- + !1 +(1 row) + +SELECT '!"1|"'::lquery; + lquery +-------- + !"1|" +(1 row) + +SELECT '4|3|"2"'::lquery; + lquery +-------- + 4|3|2 +(1 row) + +SELECT '"1".2'::lquery; + lquery +-------- + 1.2 +(1 row) + +SELECT '"1.4"|"3"|2'::lquery; + lquery +----------- + "1.4"|3|2 +(1 row) + +SELECT '"1"."4"|"3"|"2"'::lquery; + lquery +--------- + 1.4|3|2 +(1 row) + +SELECT '"1"."0"'::lquery; + lquery +-------- + 1.0 +(1 row) + +SELECT '"1".0'::lquery; + lquery +-------- + 1.0 +(1 row) + +SELECT '"1".*'::lquery; + lquery +-------- + 1.* +(1 row) + +SELECT '4|"3"|2.*'::lquery; + lquery +--------- + 4|3|2.* +(1 row) + +SELECT '4|"3"|"2.*"'::lquery; + lquery +----------- + 4|3|"2.*" +(1 row) + +SELECT '2."*"'::lquery; + lquery +-------- + 2."*" +(1 row) + +SELECT '"*".1."*"'::lquery; + lquery +----------- + "*".1."*" +(1 row) + +SELECT '"*.4"|3|2.*'::lquery; + lquery +------------- + "*.4"|3|2.* +(1 row) + +SELECT '"*.4"|3|"2.*"'::lquery; + lquery +--------------- + "*.4"|3|"2.*" +(1 row) + +SELECT '1.*.4|3|2.*{,4}'::lquery; + lquery +----------------- + 1.*.4|3|2.*{,4} +(1 row) + +SELECT '1.*.4|3|2.*{1,}'::lquery; + lquery +----------------- + 1.*.4|3|2.*{1,} +(1 row) + +SELECT '1.*.4|3|2.*{1}'::lquery; + lquery +---------------- + 1.*.4|3|2.*{1} +(1 row) + +SELECT '"qwerty"%@*.tu'::lquery; + lquery +-------------- + qwerty%@*.tu +(1 row) + +SELECT '1.*.4|3|"2".*{1,4}'::lquery; + lquery +------------------ + 1.*.4|3|2.*{1,4} +(1 row) + +SELECT '1."*".4|3|"2".*{1,4}'::lquery; + lquery +-------------------- + 1."*".4|3|2.*{1,4} +(1 row) + +SELECT '\% \@'::lquery; + lquery +-------- + "% @" +(1 row) + +SELECT '"\% \@"'::lquery; + lquery +-------- + "% @" +(1 row) + +SELECT E'\\aa.b.c.d.e'::ltree ~ 'A@.b.c.d.e'; + ?column? +---------- + f +(1 row) + +SELECT E'a\\a.b.c.\\d.e'::ltree ~ 'A*.b.c.d.e'; + ?column? +---------- + f +(1 row) + +SELECT E'a\\a.b.c.\\d.e'::ltree ~ E'A*@.b.c.d.\\e'; + ?column? +---------- + t +(1 row) + +SELECT E'a\\a.b.c.\\d.e'::ltree ~ E'A*@|\\g.b.c.d.e'; + ?column? +---------- + t +(1 row) + +--ltxtquery +SELECT '!"tree" & aWdf@*'::ltxtquery; + ltxtquery +---------------- + !tree & aWdf@* +(1 row) + +SELECT '"!tree" & aWdf@*'::ltxtquery; + ltxtquery +------------------ + "!tree" & aWdf@* +(1 row) + +SELECT E'tr\\ee'::ltree @ E'\\t\\r\\e\\e'::ltxtquery; + ?column? +---------- + t +(1 row) + +SELECT E'tr\\ee.awd\\fg'::ltree @ E'tre\\e & a\\Wdf@*'::ltxtquery; + ?column? +---------- + t +(1 row) + +SELECT 'tree & aw_qw%*'::ltxtquery; + ltxtquery +---------------- + tree & aw_qw%* +(1 row) + +SELECT 'tree."awdfg"'::ltree @ E'tree & a\\Wdf@*'::ltxtquery; + ?column? +---------- + t +(1 row) + +SELECT 'tree."awdfg"'::ltree @ E'tree & "a\\Wdf"@*'::ltxtquery; + ?column? +---------- + t +(1 row) + +SELECT 'tree.awdfg_qwerty'::ltree @ 'tree & aw_qw%*'::ltxtquery; + ?column? +---------- + t +(1 row) + +SELECT 'tree.awdfg_qwerty'::ltree @ 'tree & "aw_rw"%*'::ltxtquery; + ?column? +---------- + f +(1 row) + +SELECT 'tree.awdfg_qwerty'::ltree @ E'tree & "aw\\_qw"%*'::ltxtquery; + ?column? +---------- + t +(1 row) + +SELECT 'tree.awdfg_qwerty'::ltree @ E'tree & aw\\_qw%*'::ltxtquery; + ?column? +---------- + t +(1 row) + +SELECT E'"a\\"b"'::ltxtquery; + ltxtquery +----------- + "a\"b" +(1 row) + +SELECT '"a!b"'::ltxtquery; + ltxtquery +----------- + "a!b" +(1 row) + +SELECT '"a%b"'::ltxtquery; + ltxtquery +----------- + "a%b" +(1 row) + +SELECT '"a*b"'::ltxtquery; + ltxtquery +----------- + "a*b" +(1 row) + +SELECT '"a@b"'::ltxtquery; + ltxtquery +----------- + "a@b" +(1 row) + +SELECT '"a{b"'::ltxtquery; + ltxtquery +----------- + "a{b" +(1 row) + +SELECT '"a}b"'::ltxtquery; + ltxtquery +----------- + "a}b" +(1 row) + +SELECT '"a|b"'::ltxtquery; + ltxtquery +----------- + "a|b" +(1 row) + +SELECT '"a&b"'::ltxtquery; + ltxtquery +----------- + "a&b" +(1 row) + +SELECT '"a(b"'::ltxtquery; + ltxtquery +----------- + "a(b" +(1 row) + +SELECT '"a)b"'::ltxtquery; + ltxtquery +----------- + "a)b" +(1 row) + +SELECT E'a\\"b'::ltxtquery; + ltxtquery +----------- + "a\"b" +(1 row) + +SELECT E'a\\!b'::ltxtquery; + ltxtquery +----------- + "a!b" +(1 row) + +SELECT E'a\\%b'::ltxtquery; + ltxtquery +----------- + "a%b" +(1 row) + +SELECT E'a\\*b'::ltxtquery; + ltxtquery +----------- + "a*b" +(1 row) + +SELECT E'a\\@b'::ltxtquery; + ltxtquery +----------- + "a@b" +(1 row) + +SELECT E'a\\{b'::ltxtquery; + ltxtquery +----------- + "a{b" +(1 row) + +SELECT E'a\\}b'::ltxtquery; + ltxtquery +----------- + "a}b" +(1 row) + +SELECT E'a\\|b'::ltxtquery; + ltxtquery +----------- + "a|b" +(1 row) + +SELECT E'a\\&b'::ltxtquery; + ltxtquery +----------- + "a&b" +(1 row) + +SELECT E'a\\(b'::ltxtquery; + ltxtquery +----------- + "a(b" +(1 row) + +SELECT E'a\\)b'::ltxtquery; + ltxtquery +----------- + "a)b" +(1 row) + +SELECT E'"\\"b"'::ltxtquery; + ltxtquery +----------- + "\"b" +(1 row) + +SELECT '"!b"'::ltxtquery; + ltxtquery +----------- + "!b" +(1 row) + +SELECT '"%b"'::ltxtquery; + ltxtquery +----------- + "%b" +(1 row) + +SELECT '"*b"'::ltxtquery; + ltxtquery +----------- + "*b" +(1 row) + +SELECT '"@b"'::ltxtquery; + ltxtquery +----------- + "@b" +(1 row) + +SELECT '"{b"'::ltxtquery; + ltxtquery +----------- + "{b" +(1 row) + +SELECT '"}b"'::ltxtquery; + ltxtquery +----------- + "}b" +(1 row) + +SELECT '"|b"'::ltxtquery; + ltxtquery +----------- + "|b" +(1 row) + +SELECT '"&b"'::ltxtquery; + ltxtquery +----------- + "&b" +(1 row) + +SELECT '"(b"'::ltxtquery; + ltxtquery +----------- + "(b" +(1 row) + +SELECT '")b"'::ltxtquery; + ltxtquery +----------- + ")b" +(1 row) + +SELECT E'\\"b'::ltxtquery; + ltxtquery +----------- + "\"b" +(1 row) + +SELECT E'\\!b'::ltxtquery; + ltxtquery +----------- + "!b" +(1 row) + +SELECT E'\\%b'::ltxtquery; + ltxtquery +----------- + "%b" +(1 row) + +SELECT E'\\*b'::ltxtquery; + ltxtquery +----------- + "*b" +(1 row) + +SELECT E'\\@b'::ltxtquery; + ltxtquery +----------- + "@b" +(1 row) + +SELECT E'\\{b'::ltxtquery; + ltxtquery +----------- + "{b" +(1 row) + +SELECT E'\\}b'::ltxtquery; + ltxtquery +----------- + "}b" +(1 row) + +SELECT E'\\|b'::ltxtquery; + ltxtquery +----------- + "|b" +(1 row) + +SELECT E'\\&b'::ltxtquery; + ltxtquery +----------- + "&b" +(1 row) + +SELECT E'\\(b'::ltxtquery; + ltxtquery +----------- + "(b" +(1 row) + +SELECT E'\\)b'::ltxtquery; + ltxtquery +----------- + ")b" +(1 row) + +SELECT E'"a\\""'::ltxtquery; + ltxtquery +----------- + "a\"" +(1 row) + +SELECT '"a!"'::ltxtquery; + ltxtquery +----------- + "a!" +(1 row) + +SELECT '"a%"'::ltxtquery; + ltxtquery +----------- + "a%" +(1 row) + +SELECT '"a*"'::ltxtquery; + ltxtquery +----------- + "a*" +(1 row) + +SELECT '"a@"'::ltxtquery; + ltxtquery +----------- + "a@" +(1 row) + +SELECT '"a{"'::ltxtquery; + ltxtquery +----------- + "a{" +(1 row) + +SELECT '"a}"'::ltxtquery; + ltxtquery +----------- + "a}" +(1 row) + +SELECT '"a|"'::ltxtquery; + ltxtquery +----------- + "a|" +(1 row) + +SELECT '"a&"'::ltxtquery; + ltxtquery +----------- + "a&" +(1 row) + +SELECT '"a("'::ltxtquery; + ltxtquery +----------- + "a(" +(1 row) + +SELECT '"a)"'::ltxtquery; + ltxtquery +----------- + "a)" +(1 row) + +SELECT E'a\\"'::ltxtquery; + ltxtquery +----------- + "a\"" +(1 row) + +SELECT E'a\\!'::ltxtquery; + ltxtquery +----------- + "a!" +(1 row) + +SELECT E'a\\%'::ltxtquery; + ltxtquery +----------- + "a%" +(1 row) + +SELECT E'a\\*'::ltxtquery; + ltxtquery +----------- + "a*" +(1 row) + +SELECT E'a\\@'::ltxtquery; + ltxtquery +----------- + "a@" +(1 row) + +SELECT E'a\\{'::ltxtquery; + ltxtquery +----------- + "a{" +(1 row) + +SELECT E'a\\}'::ltxtquery; + ltxtquery +----------- + "a}" +(1 row) + +SELECT E'a\\|'::ltxtquery; + ltxtquery +----------- + "a|" +(1 row) + +SELECT E'a\\&'::ltxtquery; + ltxtquery +----------- + "a&" +(1 row) + +SELECT E'a\\('::ltxtquery; + ltxtquery +----------- + "a(" +(1 row) + +SELECT E'a\\)'::ltxtquery; + ltxtquery +----------- + "a)" +(1 row) + +--failures +SELECT E'\\'::ltree; +ERROR: syntax error +LINE 1: SELECT E'\\'::ltree; + ^ +DETAIL: Unexpected end of line. +SELECT E'n\\'::ltree; +ERROR: syntax error +LINE 1: SELECT E'n\\'::ltree; + ^ +DETAIL: Unexpected end of line. +SELECT '"'::ltree; +ERROR: syntax error +LINE 1: SELECT '"'::ltree; + ^ +DETAIL: Unexpected end of line. +SELECT '"a'::ltree; +ERROR: syntax error +LINE 1: SELECT '"a'::ltree; + ^ +DETAIL: Unexpected end of line. +SELECT '""'::ltree; +ERROR: name of level is empty +LINE 1: SELECT '""'::ltree; + ^ +DETAIL: Name length is 0 in position 2. +SELECT 'a"b'::ltree; +ERROR: syntax error at position 1 +LINE 1: SELECT 'a"b'::ltree; + ^ +SELECT E'\\"ab"'::ltree; +ERROR: syntax error at position 4 +LINE 1: SELECT E'\\"ab"'::ltree; + ^ +SELECT '"a"."a'::ltree; +ERROR: syntax error +LINE 1: SELECT '"a"."a'::ltree; + ^ +DETAIL: Unexpected end of line. +SELECT '"a."a"'::ltree; +ERROR: syntax error at position 4 +LINE 1: SELECT '"a."a"'::ltree; + ^ +SELECT '"".a'::ltree; +ERROR: name of level is empty +LINE 1: SELECT '"".a'::ltree; + ^ +DETAIL: Name length is 0 in position 2. +SELECT 'a.""'::ltree; +ERROR: name of level is empty +LINE 1: SELECT 'a.""'::ltree; + ^ +DETAIL: Name length is 0 in position 4. +SELECT '"".""'::ltree; +ERROR: name of level is empty +LINE 1: SELECT '"".""'::ltree; + ^ +DETAIL: Name length is 0 in position 2. +SELECT '""'::lquery; +ERROR: name of level is empty +LINE 1: SELECT '""'::lquery; + ^ +DETAIL: Name length is 0 in position 2. +SELECT '"".""'::lquery; +ERROR: name of level is empty +LINE 1: SELECT '"".""'::lquery; + ^ +DETAIL: Name length is 0 in position 2. +SELECT 'a.""'::lquery; +ERROR: name of level is empty +LINE 1: SELECT 'a.""'::lquery; + ^ +DETAIL: Name length is 0 in position 4. +SELECT ' . '::ltree; +ERROR: syntax error at position 1 +LINE 1: SELECT ' . '::ltree; + ^ +SELECT ' . '::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT ' . '::lquery; + ^ +SELECT ' | '::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT ' | '::lquery; + ^ +SELECT( +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'z\z\z\z\z\z')::ltree; +ERROR: name of level is too long +DETAIL: Name length is 256, must be < 256, in position 261. +SELECT( +'"01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z\z"')::ltree; +ERROR: name of level is too long +DETAIL: Name length is 256, must be < 256, in position 264. +SELECT '"'::lquery; +ERROR: syntax error +LINE 1: SELECT '"'::lquery; + ^ +DETAIL: Unexpected end of line. +SELECT '"a'::lquery; +ERROR: syntax error +LINE 1: SELECT '"a'::lquery; + ^ +DETAIL: Unexpected end of line. +SELECT '"a"."a'::lquery; +ERROR: syntax error +LINE 1: SELECT '"a"."a'::lquery; + ^ +DETAIL: Unexpected end of line. +SELECT '"a."a"'::lquery; +ERROR: syntax error at position 4 +LINE 1: SELECT '"a."a"'::lquery; + ^ +SELECT E'\\"ab"'::lquery; +ERROR: syntax error at position 4 +LINE 1: SELECT E'\\"ab"'::lquery; + ^ +SELECT 'a"b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT 'a"b'::lquery; + ^ +SELECT 'a!b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT 'a!b'::lquery; + ^ +SELECT 'a%b'::lquery; +ERROR: syntax error at position 2 +LINE 1: SELECT 'a%b'::lquery; + ^ +SELECT 'a*b'::lquery; +ERROR: syntax error at position 2 +LINE 1: SELECT 'a*b'::lquery; + ^ +SELECT 'a@b'::lquery; +ERROR: syntax error at position 2 +LINE 1: SELECT 'a@b'::lquery; + ^ +SELECT 'a{b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT 'a{b'::lquery; + ^ +SELECT 'a}b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT 'a}b'::lquery; + ^ +SELECT 'a!'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT 'a!'::lquery; + ^ +SELECT 'a{'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT 'a{'::lquery; + ^ +SELECT 'a}'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT 'a}'::lquery; + ^ +SELECT '%b'::lquery; +ERROR: syntax error at position 0 +LINE 1: SELECT '%b'::lquery; + ^ +SELECT '*b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT '*b'::lquery; + ^ +SELECT '@b'::lquery; +ERROR: syntax error at position 0 +LINE 1: SELECT '@b'::lquery; + ^ +SELECT '{b'::lquery; +ERROR: syntax error at position 0 +LINE 1: SELECT '{b'::lquery; + ^ +SELECT '}b'::lquery; +ERROR: syntax error at position 0 +LINE 1: SELECT '}b'::lquery; + ^ +SELECT '!%b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT '!%b'::lquery; + ^ +SELECT '!*b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT '!*b'::lquery; + ^ +SELECT '!@b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT '!@b'::lquery; + ^ +SELECT '!{b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT '!{b'::lquery; + ^ +SELECT '!}b'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT '!}b'::lquery; + ^ +SELECT '"qwert"y.tu'::lquery; +ERROR: syntax error at position 7 +LINE 1: SELECT '"qwert"y.tu'::lquery; + ^ +SELECT 'q"wert"y"%@*.tu'::lquery; +ERROR: syntax error at position 1 +LINE 1: SELECT 'q"wert"y"%@*.tu'::lquery; + ^ +SELECT( +'"01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z\z"')::lquery; +ERROR: name of level is too long +DETAIL: Name length is 256, must be < 256, in position 264. +SELECT 'a | ""'::ltxtquery; +ERROR: empty labels are forbidden +LINE 1: SELECT 'a | ""'::ltxtquery; + ^ +SELECT '"" & ""'::ltxtquery; +ERROR: empty labels are forbidden +LINE 1: SELECT '"" & ""'::ltxtquery; + ^ +SELECT 'a.""'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT 'a.""'::ltxtquery; + ^ +SELECT '"'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT '"'::ltxtquery; + ^ +SELECT '"""'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT '"""'::ltxtquery; + ^ +SELECT '"a'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT '"a'::ltxtquery; + ^ +SELECT '"a" & "a'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT '"a" & "a'::ltxtquery; + ^ +SELECT '"a | "a"'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT '"a | "a"'::ltxtquery; + ^ +SELECT '"!tree" & aWdf@*"'::ltxtquery; +ERROR: modifiers syntax error +LINE 1: SELECT '"!tree" & aWdf@*"'::ltxtquery; + ^ +SELECT 'a"b'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT 'a"b'::ltxtquery; + ^ +SELECT 'a!b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a!b'::ltxtquery; + ^ +SELECT 'a%b'::ltxtquery; +ERROR: modifiers syntax error +LINE 1: SELECT 'a%b'::ltxtquery; + ^ +SELECT 'a*b'::ltxtquery; +ERROR: modifiers syntax error +LINE 1: SELECT 'a*b'::ltxtquery; + ^ +SELECT 'a@b'::ltxtquery; +ERROR: modifiers syntax error +LINE 1: SELECT 'a@b'::ltxtquery; + ^ +SELECT 'a{b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a{b'::ltxtquery; + ^ +SELECT 'a}b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a}b'::ltxtquery; + ^ +SELECT 'a|b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a|b'::ltxtquery; + ^ +SELECT 'a&b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a&b'::ltxtquery; + ^ +SELECT 'a(b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a(b'::ltxtquery; + ^ +SELECT 'a)b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a)b'::ltxtquery; + ^ +SELECT '"b'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT '"b'::ltxtquery; + ^ +SELECT '%b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT '%b'::ltxtquery; + ^ +SELECT '*b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT '*b'::ltxtquery; + ^ +SELECT '@b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT '@b'::ltxtquery; + ^ +SELECT '{b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT '{b'::ltxtquery; + ^ +SELECT '}b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT '}b'::ltxtquery; + ^ +SELECT '|b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT '|b'::ltxtquery; + ^ +SELECT '&b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT '&b'::ltxtquery; + ^ +SELECT '(b'::ltxtquery; +ERROR: syntax error +LINE 1: SELECT '(b'::ltxtquery; + ^ +SELECT ')b'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT ')b'::ltxtquery; + ^ +SELECT 'a"'::ltxtquery; +ERROR: escaping syntax error +LINE 1: SELECT 'a"'::ltxtquery; + ^ +SELECT 'a!'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a!'::ltxtquery; + ^ +SELECT 'a{'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a{'::ltxtquery; + ^ +SELECT 'a}'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a}'::ltxtquery; + ^ +SELECT 'a|'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a|'::ltxtquery; + ^ +SELECT 'a&'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a&'::ltxtquery; + ^ +SELECT 'a('::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a('::ltxtquery; + ^ +SELECT 'a)'::ltxtquery; +ERROR: unquoted special symbol +LINE 1: SELECT 'a)'::ltxtquery; + ^ diff --git a/contrib/ltree/ltree.h b/contrib/ltree/ltree.h index e4b8c84fa6..a525eb2e5d 100644 --- a/contrib/ltree/ltree.h +++ b/contrib/ltree/ltree.h @@ -43,6 +43,7 @@ typedef struct #define LVAR_ANYEND 0x01 #define LVAR_INCASE 0x02 #define LVAR_SUBLEXEME 0x04 +#define LVAR_QUOTEDPART 0x08 typedef struct { @@ -80,8 +81,6 @@ typedef struct #define LQUERY_HASNOT 0x01 -#define ISALNUM(x) ( t_isalpha(x) || t_isdigit(x) || ( pg_mblen(x) == 1 && t_iseq((x), '_') ) ) - /* full text query */ /* @@ -164,6 +163,8 @@ bool compare_subnode(ltree_level *t, char *q, int len, int (*cmpptr) (const char *, const char *, size_t), bool anyend); ltree *lca_inner(ltree **a, int len); int ltree_strncasecmp(const char *a, const char *b, size_t s); +int bytes_to_escape(const char *start, const int len, const char *to_escape); +void copy_level(char *dst, const char *src, int len, int extra_bytes); /* fmgr macros for ltree objects */ #define DatumGetLtreeP(X) ((ltree *) PG_DETOAST_DATUM(X)) diff --git a/contrib/ltree/ltree_io.c b/contrib/ltree/ltree_io.c index f54f037443..e086c091ab 100644 --- a/contrib/ltree/ltree_io.c +++ b/contrib/ltree/ltree_io.c @@ -33,6 +33,211 @@ typedef struct #define LTPRS_WAITNAME 0 #define LTPRS_WAITDELIM 1 +#define LTPRS_WAITESCAPED 2 +#define LTPRS_WAITDELIMSTRICT 3 + +static void +count_parts_ors(const char *ptr, int *plevels, int *pORs) +{ + int escape_mode = 0; + int charlen; + + while (*ptr) + { + charlen = pg_mblen(ptr); + + if (escape_mode == 1) + escape_mode = 0; + else if (charlen == 1) + { + if (t_iseq(ptr, '\\')) + escape_mode = 1; + else if (t_iseq(ptr, '.')) + (*plevels)++; + else if (t_iseq(ptr, '|') && pORs != NULL) + (*pORs)++; + } + + ptr += charlen; + } + + (*plevels)++; + if (pORs != NULL) + (*pORs)++; +} + +/* + * Char-by-char copying from src to dst representation removing escaping \\ + * Total amount of copied bytes is len + */ +static void +copy_unescaped(char *dst, const char *src, int len) +{ + uint16 copied = 0; + int charlen; + bool escaping = false; + + while (*src && copied < len) + { + charlen = pg_mblen(src); + if ((charlen == 1) && t_iseq(src, '\\') && escaping == 0) + { + escaping = 1; + src++; + continue; + }; + + if (copied + charlen > len) + elog(ERROR, "internal error during splitting levels"); + + memcpy(dst, src, charlen); + src += charlen; + dst += charlen; + copied += charlen; + escaping = 0; + } + + if (copied != len) + elog(ERROR, "internal error during splitting levels"); +} + +/* + * Function calculating bytes to escape + * to_escape is an array of "special" 1-byte symbols + * Behvaiour: + * If there is no "special" symbols, return 0 + * If there are any special symbol, we need initial and final quote, so return 2 + * If there are any quotes, we need to escape all of them and also initial and final quote, so + * return 2 + number of quotes + */ +int +bytes_to_escape(const char *start, const int len, const char *to_escape) +{ + uint16 copied = 0; + int charlen; + int escapes = 0; + int quotes = 0; + const char *buf = start; + + if (len == 0) + return 2; + + while (*start && copied < len) + { + charlen = pg_mblen(buf); + if ((charlen == 1) && strchr(to_escape, *buf)) + { + escapes++; + } + else if ((charlen == 1) && t_iseq(buf, '"')) + { + quotes++; + } + + if (copied + charlen > len) + elog(ERROR, "internal error during merging levels"); + + buf += charlen; + copied += charlen; + } + + return (quotes > 0) ? quotes + 2 : + (escapes > 0) ? 2 : 0; +} + +static int +copy_escaped(char *dst, const char *src, int len) +{ + uint16 copied = 0; + int charlen; + int escapes = 0; + char *buf = dst; + + while (*src && copied < len) + { + charlen = pg_mblen(src); + if ((charlen == 1) && t_iseq(src, '"')) + { + *buf = '\\'; + buf++; + escapes++; + }; + + if (copied + charlen > len) + elog(ERROR, "internal error during merging levels"); + + memcpy(buf, src, charlen); + src += charlen; + buf += charlen; + copied += charlen; + } + return escapes; +} + +void +copy_level(char *dst, const char *src, int len, int extra_bytes) +{ + if (extra_bytes == 0) + memcpy(dst, src, len); + else if (extra_bytes == 2) + { + *dst = '"'; + memcpy(dst + 1, src, len); + dst[len + 1] = '"'; + } + else + { + *dst = '"'; + copy_escaped(dst + 1, src, len); + dst[len + extra_bytes - 1] = '"'; + } +} + +static void +real_nodeitem_len(nodeitem *lptr, const char *ptr, int escapes, int tail_space_bytes, int tail_space_symbols) +{ + lptr->len = ptr - lptr->start - escapes - + ((lptr->flag & LVAR_SUBLEXEME) ? 1 : 0) - + ((lptr->flag & LVAR_INCASE) ? 1 : 0) - + ((lptr->flag & LVAR_ANYEND) ? 1 : 0) - tail_space_bytes; + lptr->wlen -= tail_space_symbols; +} + +/* + * If we have a part beginning with quote, + * we must be sure it is finished with quote either. + * After that we moving start of the part a byte ahead + * and excluding beginning and final quotes from the part itself. + * */ +static void +adjust_quoted_nodeitem(nodeitem *lptr) +{ + lptr->start++; + lptr->len -= 2; + lptr->wlen -= 2; +} + +static void +check_level_length(const nodeitem *lptr, int pos) +{ + if (lptr->len < 0) + elog(ERROR, "internal error: invalid level length"); + + if (lptr->wlen <= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("name of level is empty"), + errdetail("Name length is 0 in position %d.", + pos))); + + if (lptr->wlen > 255) + ereport(ERROR, + (errcode(ERRCODE_NAME_TOO_LONG), + errmsg("name of level is too long"), + errdetail("Name length is %d, must " + "be < 256, in position %d.", + lptr->wlen, pos))); +} Datum ltree_in(PG_FUNCTION_ARGS) @@ -41,89 +246,158 @@ ltree_in(PG_FUNCTION_ARGS) char *ptr; nodeitem *list, *lptr; - int num = 0, + int levels = 0, totallen = 0; int state = LTPRS_WAITNAME; ltree *result; ltree_level *curlevel; int charlen; + + /* Position in strings, in symbols. */ int pos = 0; + int escaped_count = 0; + int tail_space_bytes = 0; + int tail_space_symbols = 0; ptr = buf; - while (*ptr) - { - charlen = pg_mblen(ptr); - if (charlen == 1 && t_iseq(ptr, '.')) - num++; - ptr += charlen; - } + count_parts_ors(ptr, &levels, NULL); - if (num + 1 > MaxAllocSize / sizeof(nodeitem)) + if (levels > MaxAllocSize / sizeof(nodeitem)) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("number of levels (%d) exceeds the maximum allowed (%d)", - num + 1, (int) (MaxAllocSize / sizeof(nodeitem))))); - list = lptr = (nodeitem *) palloc(sizeof(nodeitem) * (num + 1)); + levels, (int) (MaxAllocSize / sizeof(nodeitem))))); + list = lptr = (nodeitem *) palloc(sizeof(nodeitem) * (levels)); + + /* + * This block calculates single nodes' settings + */ ptr = buf; while (*ptr) { charlen = pg_mblen(ptr); - if (state == LTPRS_WAITNAME) { - if (ISALNUM(ptr)) + if (t_isspace(ptr)) { - lptr->start = ptr; - lptr->wlen = 0; - state = LTPRS_WAITDELIM; + ptr += charlen; + pos++; + continue; } - else - UNCHAR; + state = LTPRS_WAITDELIM; + lptr->start = ptr; + lptr->wlen = 0; + lptr->flag = 0; + escaped_count = 0; + + if (charlen == 1) + { + if (t_iseq(ptr, '.')) + { + UNCHAR; + } + else if (t_iseq(ptr, '\\')) + state = LTPRS_WAITESCAPED; + else if (t_iseq(ptr, '"')) + lptr->flag |= LVAR_QUOTEDPART; + } + } + else if (state == LTPRS_WAITESCAPED) + { + state = LTPRS_WAITDELIM; + escaped_count++; } else if (state == LTPRS_WAITDELIM) { - if (charlen == 1 && t_iseq(ptr, '.')) + if (charlen == 1) { - lptr->len = ptr - lptr->start; - if (lptr->wlen > 255) - ereport(ERROR, - (errcode(ERRCODE_NAME_TOO_LONG), - errmsg("name of level is too long"), - errdetail("Name length is %d, must " - "be < 256, in position %d.", - lptr->wlen, pos))); - - totallen += MAXALIGN(lptr->len + LEVEL_HDRSIZE); - lptr++; - state = LTPRS_WAITNAME; + if (t_iseq(ptr, '.') && !(lptr->flag & LVAR_QUOTEDPART)) + { + real_nodeitem_len(lptr, ptr, escaped_count, tail_space_bytes, tail_space_symbols); + check_level_length(lptr, pos); + + totallen += MAXALIGN(lptr->len + LEVEL_HDRSIZE); + lptr++; + state = LTPRS_WAITNAME; + } + else if (t_iseq(ptr, '\\')) + { + state = LTPRS_WAITESCAPED; + } + else if (t_iseq(ptr, '"')) + { + if (lptr->flag & LVAR_QUOTEDPART) + { + lptr->flag &= ~LVAR_QUOTEDPART; + state = LTPRS_WAITDELIMSTRICT; + } + else /* Unescaped quote is forbidden */ + UNCHAR; + } } - else if (!ISALNUM(ptr)) + + if (t_isspace(ptr)) + { + tail_space_symbols++; + tail_space_bytes += charlen; + } + else + { + tail_space_symbols = 0; + tail_space_bytes = 0; + } + } + else if (state == LTPRS_WAITDELIMSTRICT) + { + if (t_isspace(ptr)) + { + ptr += charlen; + pos++; + tail_space_bytes += charlen; + tail_space_symbols = 1; + continue; + } + + if (!(charlen == 1 && t_iseq(ptr, '.'))) UNCHAR; + + real_nodeitem_len(lptr, ptr, escaped_count, tail_space_bytes, tail_space_symbols); + + adjust_quoted_nodeitem(lptr); + check_level_length(lptr, pos); + + totallen += MAXALIGN(lptr->len + LEVEL_HDRSIZE); + lptr++; + state = LTPRS_WAITNAME; } else /* internal error */ elog(ERROR, "internal error in parser"); - ptr += charlen; - lptr->wlen++; + if (state == LTPRS_WAITDELIM || state == LTPRS_WAITDELIMSTRICT) + lptr->wlen++; pos++; } - if (state == LTPRS_WAITDELIM) + if (state == LTPRS_WAITDELIM || state == LTPRS_WAITDELIMSTRICT) { - lptr->len = ptr - lptr->start; - if (lptr->wlen > 255) + if (lptr->flag & LVAR_QUOTEDPART) ereport(ERROR, - (errcode(ERRCODE_NAME_TOO_LONG), - errmsg("name of level is too long"), - errdetail("Name length is %d, must " - "be < 256, in position %d.", - lptr->wlen, pos))); + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("syntax error"), + errdetail("Unexpected end of line."))); + + real_nodeitem_len(lptr, ptr, escaped_count, tail_space_bytes, tail_space_symbols); + + if (state == LTPRS_WAITDELIMSTRICT) + adjust_quoted_nodeitem(lptr); + + check_level_length(lptr, pos); totallen += MAXALIGN(lptr->len + LEVEL_HDRSIZE); lptr++; } - else if (!(state == LTPRS_WAITNAME && lptr == list)) + else if (!(state == LTPRS_WAITNAME && lptr == list)) /* Empty string */ ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("syntax error"), @@ -137,7 +411,10 @@ ltree_in(PG_FUNCTION_ARGS) while (lptr - list < result->numlevel) { curlevel->len = (uint16) lptr->len; - memcpy(curlevel->name, lptr->start, lptr->len); + if (lptr->len > 0) + { + copy_unescaped(curlevel->name, lptr->start, lptr->len); + } curlevel = LEVEL_NEXT(curlevel); lptr++; } @@ -154,8 +431,10 @@ ltree_out(PG_FUNCTION_ARGS) *ptr; int i; ltree_level *curlevel; + Size allocated = VARSIZE(in); + Size filled = 0; - ptr = buf = (char *) palloc(VARSIZE(in)); + ptr = buf = (char *) palloc(allocated); curlevel = LTREE_FIRST(in); for (i = 0; i < in->numlevel; i++) { @@ -163,9 +442,22 @@ ltree_out(PG_FUNCTION_ARGS) { *ptr = '.'; ptr++; + filled++; + } + if (curlevel->len >= 0) + { + int extra_bytes = bytes_to_escape(curlevel->name, curlevel->len, "\\ ."); + + if (filled + extra_bytes + curlevel->len >= allocated) + { + buf = repalloc(buf, allocated + (extra_bytes + curlevel->len) * 2); + allocated += (extra_bytes + curlevel->len) * 2; + ptr = buf + filled; + } + + copy_level(ptr, curlevel->name, curlevel->len, extra_bytes); + ptr += curlevel->len + extra_bytes; } - memcpy(ptr, curlevel->name, curlevel->len); - ptr += curlevel->len; curlevel = LEVEL_NEXT(curlevel); } @@ -184,6 +476,8 @@ ltree_out(PG_FUNCTION_ARGS) #define LQPRS_WAITCLOSE 6 #define LQPRS_WAITEND 7 #define LQPRS_WAITVAR 8 +#define LQPRS_WAITESCAPED 9 +#define LQPRS_WAITDELIMSTRICT 10 #define GETVAR(x) ( *((nodeitem**)LQL_FIRST(x)) ) @@ -195,7 +489,7 @@ lquery_in(PG_FUNCTION_ARGS) { char *buf = (char *) PG_GETARG_POINTER(0); char *ptr; - int num = 0, + int levels = 0, totallen = 0, numOR = 0; int state = LQPRS_WAITLEVEL; @@ -209,30 +503,20 @@ lquery_in(PG_FUNCTION_ARGS) bool wasbad = false; int charlen; int pos = 0; + int escaped_count = 0; + int real_levels = 0; + int tail_space_bytes = 0; + int tail_space_symbols = 0; ptr = buf; - while (*ptr) - { - charlen = pg_mblen(ptr); - - if (charlen == 1) - { - if (t_iseq(ptr, '.')) - num++; - else if (t_iseq(ptr, '|')) - numOR++; - } - - ptr += charlen; - } + count_parts_ors(ptr, &levels, &numOR); - num++; - if (num > MaxAllocSize / ITEMSIZE) + if (levels > MaxAllocSize / ITEMSIZE) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("number of levels (%d) exceeds the maximum allowed (%d)", - num, (int) (MaxAllocSize / ITEMSIZE)))); - curqlevel = tmpql = (lquery_level *) palloc0(ITEMSIZE * num); + levels, (int) (MaxAllocSize / ITEMSIZE)))); + curqlevel = tmpql = (lquery_level *) palloc0(ITEMSIZE * levels); ptr = buf; while (*ptr) { @@ -240,102 +524,207 @@ lquery_in(PG_FUNCTION_ARGS) if (state == LQPRS_WAITLEVEL) { - if (ISALNUM(ptr)) + if (t_isspace(ptr)) { - GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * (numOR + 1)); - lptr->start = ptr; - state = LQPRS_WAITDELIM; - curqlevel->numvar = 1; + ptr += charlen; + pos++; + continue; + } + + escaped_count = 0; + real_levels++; + if (charlen == 1) + { + if (t_iseq(ptr, '!')) + { + GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR); + lptr->start = ptr + 1; + state = LQPRS_WAITDELIM; + curqlevel->numvar = 1; + curqlevel->flag |= LQL_NOT; + hasnot = true; + } + else if (t_iseq(ptr, '*')) + state = LQPRS_WAITOPEN; + else if (t_iseq(ptr, '\\')) + { + GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR); + lptr->start = ptr; + curqlevel->numvar = 1; + state = LQPRS_WAITESCAPED; + } + else if (strchr(".|@%{}", *ptr)) + { + UNCHAR; + } + else + { + GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR); + lptr->start = ptr; + state = LQPRS_WAITDELIM; + curqlevel->numvar = 1; + if (t_iseq(ptr, '"')) + { + lptr->flag |= LVAR_QUOTEDPART; + } + } } - else if (charlen == 1 && t_iseq(ptr, '!')) + else { - GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * (numOR + 1)); - lptr->start = ptr + 1; + GETVAR(curqlevel) = lptr = (nodeitem *) palloc0(sizeof(nodeitem) * numOR); + lptr->start = ptr; state = LQPRS_WAITDELIM; curqlevel->numvar = 1; - curqlevel->flag |= LQL_NOT; - hasnot = true; } - else if (charlen == 1 && t_iseq(ptr, '*')) - state = LQPRS_WAITOPEN; - else - UNCHAR; } else if (state == LQPRS_WAITVAR) { - if (ISALNUM(ptr)) + if (t_isspace(ptr)) { - lptr++; - lptr->start = ptr; - state = LQPRS_WAITDELIM; - curqlevel->numvar++; + ptr += charlen; + pos++; + continue; } - else + + escaped_count = 0; + lptr++; + lptr->start = ptr; + curqlevel->numvar++; + if (t_iseq(ptr, '.') || t_iseq(ptr, '|')) UNCHAR; + + state = (t_iseq(ptr, '\\')) ? LQPRS_WAITESCAPED : LQPRS_WAITDELIM; + if (t_iseq(ptr, '"')) + lptr->flag |= LVAR_QUOTEDPART; } - else if (state == LQPRS_WAITDELIM) + else if (state == LQPRS_WAITDELIM || state == LQPRS_WAITDELIMSTRICT) { - if (charlen == 1 && t_iseq(ptr, '@')) + if (charlen == 1 && t_iseq(ptr, '"')) { + /* We are here if variant begins with ! */ if (lptr->start == ptr) + lptr->flag |= LVAR_QUOTEDPART; + else if (state == LQPRS_WAITDELIMSTRICT) + { UNCHAR; - lptr->flag |= LVAR_INCASE; - curqlevel->flag |= LVAR_INCASE; - } - else if (charlen == 1 && t_iseq(ptr, '*')) - { - if (lptr->start == ptr) + } + else if (lptr->flag & LVAR_QUOTEDPART) + { + lptr->flag &= ~LVAR_QUOTEDPART; + state = LQPRS_WAITDELIMSTRICT; + } + else UNCHAR; - lptr->flag |= LVAR_ANYEND; - curqlevel->flag |= LVAR_ANYEND; } - else if (charlen == 1 && t_iseq(ptr, '%')) + else if ((lptr->flag & LVAR_QUOTEDPART) == 0) { - if (lptr->start == ptr) - UNCHAR; - lptr->flag |= LVAR_SUBLEXEME; - curqlevel->flag |= LVAR_SUBLEXEME; + if (charlen == 1 && t_iseq(ptr, '@')) + { + if (lptr->start == ptr) + UNCHAR; + lptr->flag |= LVAR_INCASE; + curqlevel->flag |= LVAR_INCASE; + } + else if (charlen == 1 && t_iseq(ptr, '*')) + { + if (lptr->start == ptr) + UNCHAR; + lptr->flag |= LVAR_ANYEND; + curqlevel->flag |= LVAR_ANYEND; + } + else if (charlen == 1 && t_iseq(ptr, '%')) + { + if (lptr->start == ptr) + UNCHAR; + lptr->flag |= LVAR_SUBLEXEME; + curqlevel->flag |= LVAR_SUBLEXEME; + } + else if (charlen == 1 && t_iseq(ptr, '|')) + { + real_nodeitem_len(lptr, ptr, escaped_count, tail_space_bytes, tail_space_symbols); + + if (state == LQPRS_WAITDELIMSTRICT) + adjust_quoted_nodeitem(lptr); + + check_level_length(lptr, pos); + state = LQPRS_WAITVAR; + } + else if (charlen == 1 && t_iseq(ptr, '.')) + { + real_nodeitem_len(lptr, ptr, escaped_count, tail_space_bytes, tail_space_symbols); + + if (state == LQPRS_WAITDELIMSTRICT) + adjust_quoted_nodeitem(lptr); + + check_level_length(lptr, pos); + + state = LQPRS_WAITLEVEL; + curqlevel = NEXTLEV(curqlevel); + } + else if (charlen == 1 && t_iseq(ptr, '\\')) + { + if (state == LQPRS_WAITDELIMSTRICT) + UNCHAR; + state = LQPRS_WAITESCAPED; + } + else + { + if (charlen == 1 && strchr("!{}", *ptr)) + UNCHAR; + if (state == LQPRS_WAITDELIMSTRICT) + { + if (t_isspace(ptr)) + { + ptr += charlen; + pos++; + tail_space_bytes += charlen; + tail_space_symbols = 1; + continue; + } + + UNCHAR; + } + if (lptr->flag & ~LVAR_QUOTEDPART) + UNCHAR; + } } - else if (charlen == 1 && t_iseq(ptr, '|')) + else if (charlen == 1 && t_iseq(ptr, '\\')) { - lptr->len = ptr - lptr->start - - ((lptr->flag & LVAR_SUBLEXEME) ? 1 : 0) - - ((lptr->flag & LVAR_INCASE) ? 1 : 0) - - ((lptr->flag & LVAR_ANYEND) ? 1 : 0); - if (lptr->wlen > 255) - ereport(ERROR, - (errcode(ERRCODE_NAME_TOO_LONG), - errmsg("name of level is too long"), - errdetail("Name length is %d, must " - "be < 256, in position %d.", - lptr->wlen, pos))); - - state = LQPRS_WAITVAR; + if (state == LQPRS_WAITDELIMSTRICT) + UNCHAR; + if (lptr->flag & ~LVAR_QUOTEDPART) + UNCHAR; + state = LQPRS_WAITESCAPED; } - else if (charlen == 1 && t_iseq(ptr, '.')) + else { - lptr->len = ptr - lptr->start - - ((lptr->flag & LVAR_SUBLEXEME) ? 1 : 0) - - ((lptr->flag & LVAR_INCASE) ? 1 : 0) - - ((lptr->flag & LVAR_ANYEND) ? 1 : 0); - if (lptr->wlen > 255) - ereport(ERROR, - (errcode(ERRCODE_NAME_TOO_LONG), - errmsg("name of level is too long"), - errdetail("Name length is %d, must " - "be < 256, in position %d.", - lptr->wlen, pos))); + if (state == LQPRS_WAITDELIMSTRICT) + { + if (t_isspace(ptr)) + { + ptr += charlen; + pos++; + tail_space_bytes += charlen; + tail_space_symbols = 1; + continue; + } - state = LQPRS_WAITLEVEL; - curqlevel = NEXTLEV(curqlevel); + UNCHAR; + } + if (lptr->flag & ~LVAR_QUOTEDPART) + UNCHAR; } - else if (ISALNUM(ptr)) + + if (t_isspace(ptr)) { - if (lptr->flag) - UNCHAR; + tail_space_symbols++; + tail_space_bytes += charlen; } else - UNCHAR; + { + tail_space_symbols = 0; + tail_space_bytes = 0; + } } else if (state == LQPRS_WAITOPEN) { @@ -399,7 +788,7 @@ lquery_in(PG_FUNCTION_ARGS) } else if (state == LQPRS_WAITEND) { - if (charlen == 1 && t_iseq(ptr, '.')) + if (charlen == 1 && (t_iseq(ptr, '.') || t_iseq(ptr, '|'))) { state = LQPRS_WAITLEVEL; curqlevel = NEXTLEV(curqlevel); @@ -407,17 +796,29 @@ lquery_in(PG_FUNCTION_ARGS) else UNCHAR; } + else if (state == LQPRS_WAITESCAPED) + { + state = LQPRS_WAITDELIM; + escaped_count++; + } else /* internal error */ elog(ERROR, "internal error in parser"); ptr += charlen; - if (state == LQPRS_WAITDELIM) + if (state == LQPRS_WAITDELIM || state == LQPRS_WAITDELIMSTRICT) lptr->wlen++; pos++; } - if (state == LQPRS_WAITDELIM) + if (lptr->flag & LVAR_QUOTEDPART) + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("syntax error"), + errdetail("Unexpected end of line."))); + } + else if (state == LQPRS_WAITDELIM || state == LQPRS_WAITDELIMSTRICT) { if (lptr->start == ptr) ereport(ERROR, @@ -425,23 +826,12 @@ lquery_in(PG_FUNCTION_ARGS) errmsg("syntax error"), errdetail("Unexpected end of line."))); - lptr->len = ptr - lptr->start - - ((lptr->flag & LVAR_SUBLEXEME) ? 1 : 0) - - ((lptr->flag & LVAR_INCASE) ? 1 : 0) - - ((lptr->flag & LVAR_ANYEND) ? 1 : 0); - if (lptr->len == 0) - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("syntax error"), - errdetail("Unexpected end of line."))); + real_nodeitem_len(lptr, ptr, escaped_count, tail_space_bytes, tail_space_symbols); - if (lptr->wlen > 255) - ereport(ERROR, - (errcode(ERRCODE_NAME_TOO_LONG), - errmsg("name of level is too long"), - errdetail("Name length is %d, must " - "be < 256, in position %d.", - lptr->wlen, pos))); + if (state == LQPRS_WAITDELIMSTRICT) + adjust_quoted_nodeitem(lptr); + + check_level_length(lptr, pos); } else if (state == LQPRS_WAITOPEN) curqlevel->high = 0xffff; @@ -450,10 +840,16 @@ lquery_in(PG_FUNCTION_ARGS) (errcode(ERRCODE_SYNTAX_ERROR), errmsg("syntax error"), errdetail("Unexpected end of line."))); + else if (state == LQPRS_WAITESCAPED) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("syntax error"), + errdetail("Unexpected end of line."))); + curqlevel = tmpql; totallen = LQUERY_HDRSIZE; - while ((char *) curqlevel - (char *) tmpql < num * ITEMSIZE) + while ((char *) curqlevel - (char *) tmpql < levels * ITEMSIZE) { totallen += LQL_HDRSIZE; if (curqlevel->numvar) @@ -477,14 +873,14 @@ lquery_in(PG_FUNCTION_ARGS) result = (lquery *) palloc0(totallen); SET_VARSIZE(result, totallen); - result->numlevel = num; + result->numlevel = real_levels; result->firstgood = 0; result->flag = 0; if (hasnot) result->flag |= LQUERY_HASNOT; cur = LQUERY_FIRST(result); curqlevel = tmpql; - while ((char *) curqlevel - (char *) tmpql < num * ITEMSIZE) + while ((char *) curqlevel - (char *) tmpql < levels * ITEMSIZE) { memcpy(cur, curqlevel, LQL_HDRSIZE); cur->totallen = LQL_HDRSIZE; @@ -497,8 +893,8 @@ lquery_in(PG_FUNCTION_ARGS) cur->totallen += MAXALIGN(LVAR_HDRSIZE + lptr->len); lrptr->len = lptr->len; lrptr->flag = lptr->flag; - lrptr->val = ltree_crc32_sz(lptr->start, lptr->len); - memcpy(lrptr->name, lptr->start, lptr->len); + copy_unescaped(lrptr->name, lptr->start, lptr->len); + lrptr->val = ltree_crc32_sz(lrptr->name, lptr->len); lptr++; lrptr = LVAR_NEXT(lrptr); } @@ -526,7 +922,8 @@ lquery_out(PG_FUNCTION_ARGS) *ptr; int i, j, - totallen = 1; + totallen = 1, + filled = 0; lquery_level *curqlevel; lquery_variant *curtlevel; @@ -549,6 +946,7 @@ lquery_out(PG_FUNCTION_ARGS) { *ptr = '.'; ptr++; + filled++; } if (curqlevel->numvar) { @@ -556,31 +954,46 @@ lquery_out(PG_FUNCTION_ARGS) { *ptr = '!'; ptr++; + filled++; } curtlevel = LQL_FIRST(curqlevel); for (j = 0; j < curqlevel->numvar; j++) { + int extra_bytes = bytes_to_escape(curtlevel->name, curtlevel->len, ". \\|!*@%{}"); + if (j != 0) { *ptr = '|'; ptr++; + filled++; } - memcpy(ptr, curtlevel->name, curtlevel->len); - ptr += curtlevel->len; + if (filled + extra_bytes + curtlevel->len >= totallen) + { + buf = repalloc(buf, totallen + (extra_bytes + curtlevel->len) * 2); + totallen += (extra_bytes + curtlevel->len) * 2; + ptr = buf + filled; + } + + copy_level(ptr, curtlevel->name, curtlevel->len, extra_bytes); + ptr += curtlevel->len + extra_bytes; + if ((curtlevel->flag & LVAR_SUBLEXEME)) { *ptr = '%'; ptr++; + filled++; } if ((curtlevel->flag & LVAR_INCASE)) { *ptr = '@'; ptr++; + filled++; } if ((curtlevel->flag & LVAR_ANYEND)) { *ptr = '*'; ptr++; + filled++; } curtlevel = LVAR_NEXT(curtlevel); } @@ -608,6 +1021,7 @@ lquery_out(PG_FUNCTION_ARGS) else sprintf(ptr, "*{%d,%d}", curqlevel->low, curqlevel->high); ptr = strchr(ptr, '\0'); + filled = ptr - buf; } curqlevel = LQL_NEXT(curqlevel); diff --git a/contrib/ltree/ltxtquery_io.c b/contrib/ltree/ltxtquery_io.c index 56bf39d145..45261415a0 100644 --- a/contrib/ltree/ltxtquery_io.c +++ b/contrib/ltree/ltxtquery_io.c @@ -19,6 +19,8 @@ PG_FUNCTION_INFO_V1(ltxtq_out); #define WAITOPERAND 1 #define INOPERAND 2 #define WAITOPERATOR 3 +#define WAITESCAPED 4 +#define ENDOPERAND 5 /* * node of query tree, also used @@ -78,38 +80,151 @@ gettoken_query(QPRS_STATE *state, int32 *val, int32 *lenval, char **strval, uint (state->buf)++; return OPEN; } - else if (ISALNUM(state->buf)) + else if (charlen == 1 && t_iseq(state->buf, '\\')) { + state->state = WAITESCAPED; + *strval = state->buf; + *lenval = 1; + *flag = 0; + } + else if (t_isspace(state->buf)) + { + /* do nothing */ + } + else + { + if (charlen == 1 && strchr("{}()|&%*@", *(state->buf))) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unquoted special symbol"))); + state->state = INOPERAND; *strval = state->buf; *lenval = charlen; *flag = 0; + if (charlen == 1 && t_iseq(state->buf, '"')) + *flag |= LVAR_QUOTEDPART; } - else if (!t_isspace(state->buf)) - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("operand syntax error"))); break; case INOPERAND: - if (ISALNUM(state->buf)) + case ENDOPERAND: + if (charlen == 1 && t_iseq(state->buf, '"')) { - if (*flag) + if (state->state == ENDOPERAND) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("escaping syntax error"))); + else if (*flag & ~LVAR_QUOTEDPART) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("modifiers syntax error"))); + else if (*flag & LVAR_QUOTEDPART) + { + *flag &= ~LVAR_QUOTEDPART; + state->state = ENDOPERAND; + } + else + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("escaping syntax error"))); + } + else if ((*flag & LVAR_QUOTEDPART) == 0) + { + if ((*(state->buf) == '\0') || t_isspace(state->buf)) + { + /* Adjust */ + if (state->state == ENDOPERAND) + { + (*strval)++; + (*lenval)--; + } + state->state = WAITOPERATOR; + return VAL; + } + + if (charlen == 1 && strchr("!{}()|&", *(state->buf))) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("unquoted special symbol"))); + + if (charlen != 1 || (charlen == 1 && !strchr("@%*\\", *(state->buf)))) + { + if (*flag & ~LVAR_QUOTEDPART) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("modifiers syntax error"))); + + *lenval += charlen; + } + else if (charlen == 1 && t_iseq(state->buf, '%')) + *flag |= LVAR_SUBLEXEME; + else if (charlen == 1 && t_iseq(state->buf, '@')) + *flag |= LVAR_INCASE; + else if (charlen == 1 && t_iseq(state->buf, '*')) + *flag |= LVAR_ANYEND; + else if (charlen == 1 && t_iseq(state->buf, '\\')) + { + if (*flag & ~LVAR_QUOTEDPART) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("escaping syntax error"))); + + state->state = WAITESCAPED; + *lenval += charlen; + } + else + { + /* Adjust */ + if (state->state == ENDOPERAND) + { + (*strval)++; + (*lenval)--; + } + state->state = WAITOPERATOR; + return VAL; + } + } + else if (charlen == 1 && t_iseq(state->buf, '\\')) + { + if (state->state == ENDOPERAND) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("escaping syntax error"))); + if (*flag & ~LVAR_QUOTEDPART) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("escaping syntax error"))); + + state->state = WAITESCAPED; *lenval += charlen; } - else if (charlen == 1 && t_iseq(state->buf, '%')) - *flag |= LVAR_SUBLEXEME; - else if (charlen == 1 && t_iseq(state->buf, '@')) - *flag |= LVAR_INCASE; - else if (charlen == 1 && t_iseq(state->buf, '*')) - *flag |= LVAR_ANYEND; else { - state->state = WAITOPERATOR; - return VAL; + if (*(state->buf) == '\0' && (*flag & LVAR_QUOTEDPART)) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("escaping syntax error"))); + + if (state->state == ENDOPERAND) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("syntax error"))); + if (*flag & ~LVAR_QUOTEDPART) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("syntax error"))); + *lenval += charlen; + } + break; + case WAITESCAPED: + if (*(state->buf) == '\0') + { + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("escaping syntax error"))); } + *lenval += charlen; + state->state = INOPERAND; break; case WAITOPERATOR: if (charlen == 1 && (t_iseq(state->buf, '&') || t_iseq(state->buf, '|'))) @@ -139,6 +254,47 @@ gettoken_query(QPRS_STATE *state, int32 *val, int32 *lenval, char **strval, uint } } +/* + * This function is similar to copy_unescaped. + * It proceeds total_len bytes from src + * Copying all to dst skipping escapes + * Returns amount of skipped symbols + * */ +static int +copy_skip_escapes(char *dst, const char *src, int total_len) +{ + uint16 copied = 0; + int charlen; + bool escaping = false; + int skipped = 0; + + while (*src && (copied + skipped < total_len)) + { + charlen = pg_mblen(src); + if ((charlen == 1) && t_iseq(src, '\\') && escaping == 0) + { + escaping = 1; + src++; + skipped++; + continue; + }; + + if (copied + skipped + charlen > total_len) + elog(ERROR, "internal error during copying"); + + memcpy(dst, src, charlen); + src += charlen; + dst += charlen; + copied += charlen; + escaping = 0; + } + + if (copied + skipped != total_len) + elog(ERROR, "internal error during copying"); + + return skipped; +} + /* * push new one in polish notation reverse view */ @@ -171,14 +327,18 @@ pushquery(QPRS_STATE *state, int32 type, int32 val, int32 distance, int32 lenval static void pushval_asis(QPRS_STATE *state, int type, char *strval, int lenval, uint16 flag) { + int skipped = 0; + + if (lenval == 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("empty labels are forbidden"))); + if (lenval > 0xffff) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("word is too long"))); - pushquery(state, type, ltree_crc32_sz(strval, lenval), - state->curop - state->op, lenval, flag); - while (state->curop - state->op + lenval + 1 >= state->lenop) { int32 tmp = state->curop - state->op; @@ -187,11 +347,19 @@ pushval_asis(QPRS_STATE *state, int type, char *strval, int lenval, uint16 flag) state->op = (char *) repalloc((void *) state->op, state->lenop); state->curop = state->op + tmp; } - memcpy((void *) state->curop, (void *) strval, lenval); - state->curop += lenval; + skipped = copy_skip_escapes((void *) state->curop, (void *) strval, lenval); + if (lenval == skipped) /* Empty quoted literal */ + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("empty labels are forbidden"))); + + pushquery(state, type, ltree_crc32_sz(state->curop, lenval - skipped), + state->curop - state->op, lenval - skipped, flag); + + state->curop += lenval - skipped; *(state->curop) = '\0'; state->curop++; - state->sumlen += lenval + 1; + state->sumlen += lenval - skipped + 1; return; } @@ -422,14 +590,14 @@ infix(INFIX *in, bool first) if (in->curpol->type == VAL) { char *op = in->op + in->curpol->distance; + char *opend = strchr(op, '\0'); + int delta = opend - op; + int extra_bytes = bytes_to_escape(op, delta, ". \\|!%@*{}&()"); RESIZEBUF(in, in->curpol->length * 2 + 5); - while (*op) - { - *(in->cur) = *op; - op++; - in->cur++; - } + copy_level(in->cur, op, delta, extra_bytes); + in->cur += delta + extra_bytes; + if (in->curpol->flag & LVAR_SUBLEXEME) { *(in->cur) = '%'; diff --git a/contrib/ltree/sql/ltree.sql b/contrib/ltree/sql/ltree.sql index 846b04e48e..9268742293 100644 --- a/contrib/ltree/sql/ltree.sql +++ b/contrib/ltree/sql/ltree.sql @@ -1,5 +1,7 @@ CREATE EXTENSION ltree; +SET standard_conforming_strings=on; + -- Check whether any of our opclasses fail amvalidate SELECT amname, opcname FROM pg_opclass opc LEFT JOIN pg_am am ON am.oid = opcmethod @@ -291,3 +293,379 @@ SELECT count(*) FROM _ltreetest WHERE t ~ '23.*{1}.1' ; SELECT count(*) FROM _ltreetest WHERE t ~ '23.*.1' ; SELECT count(*) FROM _ltreetest WHERE t ~ '23.*.2' ; SELECT count(*) FROM _ltreetest WHERE t ? '{23.*.1,23.*.2}' ; + +-- Extended syntax, escaping, quoting etc +-- success +SELECT E'\\.'::ltree; +SELECT E'\\ '::ltree; +SELECT E'\\\\'::ltree; +SELECT E'\\a'::ltree; +SELECT E'\\n'::ltree; +SELECT E'x\\\\'::ltree; +SELECT E'x\\ '::ltree; +SELECT E'x\\.'::ltree; +SELECT E'x\\a'::ltree; +SELECT E'x\\n'::ltree; +SELECT 'a b.с d'::ltree; +SELECT ' e . f '::ltree; +SELECT ' '::ltree; + +SELECT E'\\ g . h\\ '::ltree; +SELECT E'\\ g'::ltree; +SELECT E' h\\ '::ltree; +SELECT '" g "." h "'::ltree; +SELECT '" g " '::ltree; +SELECT '" g " ." h " '::ltree; + +SELECT nlevel(E'Bottom\\.Test'::ltree); +SELECT subpath(E'Bottom\\.'::ltree, 0, 1); + +SELECT subpath(E'a\\.b', 0, 1); +SELECT subpath(E'a\\..b', 1, 1); +SELECT subpath(E'a\\..\\b', 1, 1); +SELECT subpath(E'a b.с d'::ltree, 1, 1); + +SELECT( +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z')::ltree; + +SELECT(' ' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\a\b\c\d\e ')::ltree; + +SELECT 'abc\|d'::lquery; +SELECT 'abc\|d'::ltree ~ 'abc\|d'::lquery; +SELECT 'abc|d'::ltree ~ 'abc*'::lquery; --true +SELECT 'abc|d'::ltree ~ 'abc\*'::lquery; --false +SELECT E'abc|\\.'::ltree ~ 'abc\|*'::lquery; --true + +SELECT E'"\\""'::ltree; +SELECT '\"'::ltree; +SELECT E'\\"'::ltree; +SELECT 'a\"b'::ltree; +SELECT '"ab"'::ltree; +SELECT '"."'::ltree; +SELECT E'".\\""'::ltree; +SELECT( +'"01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z"')::ltree; + +SELECT E'"\\""'::lquery; +SELECT '\"'::lquery; +SELECT E'\\"'::lquery; +SELECT 'a\"b'::lquery; +SELECT '"ab"'::lquery; +SELECT '"."'::lquery; +SELECT E'".\\""'::lquery; +SELECT( +'"01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z"')::lquery; + +SELECT ' e . f '::lquery; +SELECT ' e | f '::lquery; + +SELECT E'\\ g . h\\ '::lquery; +SELECT E'\\ g'::lquery; +SELECT E' h\\ '::lquery; +SELECT E'"\\ g"'::lquery; +SELECT E' "h\\ "'::lquery; +SELECT '" g "." h "'::lquery; + +SELECT E'\\ g | h\\ '::lquery; +SELECT '" g "|" h "'::lquery; + +SELECT '" g " '::lquery; +SELECT '" g " ." h " '::lquery; +SELECT '" g " | " h " '::lquery; + +SELECT(' ' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\a\b\c\d\e ')::lquery; + +SELECT E'"a\\"b"'::lquery; +SELECT '"a!b"'::lquery; +SELECT '"a%b"'::lquery; +SELECT '"a*b"'::lquery; +SELECT '"a@b"'::lquery; +SELECT '"a{b"'::lquery; +SELECT '"a}b"'::lquery; +SELECT '"a|b"'::lquery; + +SELECT E'a\\"b'::lquery; +SELECT E'a\\!b'::lquery; +SELECT E'a\\%b'::lquery; +SELECT E'a\\*b'::lquery; +SELECT E'a\\@b'::lquery; +SELECT E'a\\{b'::lquery; +SELECT E'a\\}b'::lquery; +SELECT E'a\\|b'::lquery; + +SELECT '!"!b"'::lquery; +SELECT '!"%b"'::lquery; +SELECT '!"*b"'::lquery; +SELECT '!"@b"'::lquery; +SELECT '!"{b"'::lquery; +SELECT '!"}b"'::lquery; + +SELECT E'!\\!b'::lquery; +SELECT E'!\\%b'::lquery; +SELECT E'!\\*b'::lquery; +SELECT E'!\\@b'::lquery; +SELECT E'!\\{b'::lquery; +SELECT E'!\\}b'::lquery; + +SELECT '"1"'::lquery; +SELECT '"2.*"'::lquery; +SELECT '!"1"'::lquery; +SELECT '!"1|"'::lquery; +SELECT '4|3|"2"'::lquery; +SELECT '"1".2'::lquery; +SELECT '"1.4"|"3"|2'::lquery; +SELECT '"1"."4"|"3"|"2"'::lquery; +SELECT '"1"."0"'::lquery; +SELECT '"1".0'::lquery; +SELECT '"1".*'::lquery; +SELECT '4|"3"|2.*'::lquery; +SELECT '4|"3"|"2.*"'::lquery; +SELECT '2."*"'::lquery; +SELECT '"*".1."*"'::lquery; +SELECT '"*.4"|3|2.*'::lquery; +SELECT '"*.4"|3|"2.*"'::lquery; +SELECT '1.*.4|3|2.*{,4}'::lquery; +SELECT '1.*.4|3|2.*{1,}'::lquery; +SELECT '1.*.4|3|2.*{1}'::lquery; +SELECT '"qwerty"%@*.tu'::lquery; + +SELECT '1.*.4|3|"2".*{1,4}'::lquery; +SELECT '1."*".4|3|"2".*{1,4}'::lquery; +SELECT '\% \@'::lquery; +SELECT '"\% \@"'::lquery; + +SELECT E'\\aa.b.c.d.e'::ltree ~ 'A@.b.c.d.e'; +SELECT E'a\\a.b.c.\\d.e'::ltree ~ 'A*.b.c.d.e'; +SELECT E'a\\a.b.c.\\d.e'::ltree ~ E'A*@.b.c.d.\\e'; +SELECT E'a\\a.b.c.\\d.e'::ltree ~ E'A*@|\\g.b.c.d.e'; +--ltxtquery +SELECT '!"tree" & aWdf@*'::ltxtquery; +SELECT '"!tree" & aWdf@*'::ltxtquery; +SELECT E'tr\\ee'::ltree @ E'\\t\\r\\e\\e'::ltxtquery; +SELECT E'tr\\ee.awd\\fg'::ltree @ E'tre\\e & a\\Wdf@*'::ltxtquery; +SELECT 'tree & aw_qw%*'::ltxtquery; +SELECT 'tree."awdfg"'::ltree @ E'tree & a\\Wdf@*'::ltxtquery; +SELECT 'tree."awdfg"'::ltree @ E'tree & "a\\Wdf"@*'::ltxtquery; +SELECT 'tree.awdfg_qwerty'::ltree @ 'tree & aw_qw%*'::ltxtquery; +SELECT 'tree.awdfg_qwerty'::ltree @ 'tree & "aw_rw"%*'::ltxtquery; +SELECT 'tree.awdfg_qwerty'::ltree @ E'tree & "aw\\_qw"%*'::ltxtquery; +SELECT 'tree.awdfg_qwerty'::ltree @ E'tree & aw\\_qw%*'::ltxtquery; + +SELECT E'"a\\"b"'::ltxtquery; +SELECT '"a!b"'::ltxtquery; +SELECT '"a%b"'::ltxtquery; +SELECT '"a*b"'::ltxtquery; +SELECT '"a@b"'::ltxtquery; +SELECT '"a{b"'::ltxtquery; +SELECT '"a}b"'::ltxtquery; +SELECT '"a|b"'::ltxtquery; +SELECT '"a&b"'::ltxtquery; +SELECT '"a(b"'::ltxtquery; +SELECT '"a)b"'::ltxtquery; + +SELECT E'a\\"b'::ltxtquery; +SELECT E'a\\!b'::ltxtquery; +SELECT E'a\\%b'::ltxtquery; +SELECT E'a\\*b'::ltxtquery; +SELECT E'a\\@b'::ltxtquery; +SELECT E'a\\{b'::ltxtquery; +SELECT E'a\\}b'::ltxtquery; +SELECT E'a\\|b'::ltxtquery; +SELECT E'a\\&b'::ltxtquery; +SELECT E'a\\(b'::ltxtquery; +SELECT E'a\\)b'::ltxtquery; + +SELECT E'"\\"b"'::ltxtquery; +SELECT '"!b"'::ltxtquery; +SELECT '"%b"'::ltxtquery; +SELECT '"*b"'::ltxtquery; +SELECT '"@b"'::ltxtquery; +SELECT '"{b"'::ltxtquery; +SELECT '"}b"'::ltxtquery; +SELECT '"|b"'::ltxtquery; +SELECT '"&b"'::ltxtquery; +SELECT '"(b"'::ltxtquery; +SELECT '")b"'::ltxtquery; + +SELECT E'\\"b'::ltxtquery; +SELECT E'\\!b'::ltxtquery; +SELECT E'\\%b'::ltxtquery; +SELECT E'\\*b'::ltxtquery; +SELECT E'\\@b'::ltxtquery; +SELECT E'\\{b'::ltxtquery; +SELECT E'\\}b'::ltxtquery; +SELECT E'\\|b'::ltxtquery; +SELECT E'\\&b'::ltxtquery; +SELECT E'\\(b'::ltxtquery; +SELECT E'\\)b'::ltxtquery; + +SELECT E'"a\\""'::ltxtquery; +SELECT '"a!"'::ltxtquery; +SELECT '"a%"'::ltxtquery; +SELECT '"a*"'::ltxtquery; +SELECT '"a@"'::ltxtquery; +SELECT '"a{"'::ltxtquery; +SELECT '"a}"'::ltxtquery; +SELECT '"a|"'::ltxtquery; +SELECT '"a&"'::ltxtquery; +SELECT '"a("'::ltxtquery; +SELECT '"a)"'::ltxtquery; + +SELECT E'a\\"'::ltxtquery; +SELECT E'a\\!'::ltxtquery; +SELECT E'a\\%'::ltxtquery; +SELECT E'a\\*'::ltxtquery; +SELECT E'a\\@'::ltxtquery; +SELECT E'a\\{'::ltxtquery; +SELECT E'a\\}'::ltxtquery; +SELECT E'a\\|'::ltxtquery; +SELECT E'a\\&'::ltxtquery; +SELECT E'a\\('::ltxtquery; +SELECT E'a\\)'::ltxtquery; + +--failures +SELECT E'\\'::ltree; +SELECT E'n\\'::ltree; +SELECT '"'::ltree; +SELECT '"a'::ltree; +SELECT '""'::ltree; +SELECT 'a"b'::ltree; +SELECT E'\\"ab"'::ltree; +SELECT '"a"."a'::ltree; +SELECT '"a."a"'::ltree; +SELECT '"".a'::ltree; +SELECT 'a.""'::ltree; +SELECT '"".""'::ltree; +SELECT '""'::lquery; +SELECT '"".""'::lquery; +SELECT 'a.""'::lquery; +SELECT ' . '::ltree; +SELECT ' . '::lquery; +SELECT ' | '::lquery; + +SELECT( +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'z\z\z\z\z\z')::ltree; +SELECT( +'"01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z\z"')::ltree; + +SELECT '"'::lquery; +SELECT '"a'::lquery; +SELECT '"a"."a'::lquery; +SELECT '"a."a"'::lquery; + +SELECT E'\\"ab"'::lquery; +SELECT 'a"b'::lquery; +SELECT 'a!b'::lquery; +SELECT 'a%b'::lquery; +SELECT 'a*b'::lquery; +SELECT 'a@b'::lquery; +SELECT 'a{b'::lquery; +SELECT 'a}b'::lquery; + +SELECT 'a!'::lquery; +SELECT 'a{'::lquery; +SELECT 'a}'::lquery; + +SELECT '%b'::lquery; +SELECT '*b'::lquery; +SELECT '@b'::lquery; +SELECT '{b'::lquery; +SELECT '}b'::lquery; + +SELECT '!%b'::lquery; +SELECT '!*b'::lquery; +SELECT '!@b'::lquery; +SELECT '!{b'::lquery; +SELECT '!}b'::lquery; + +SELECT '"qwert"y.tu'::lquery; +SELECT 'q"wert"y"%@*.tu'::lquery; + +SELECT( +'"01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'01234567890123456789012345678901234567890123456789' || +'\z\z\z\z\z\z"')::lquery; + +SELECT 'a | ""'::ltxtquery; +SELECT '"" & ""'::ltxtquery; +SELECT 'a.""'::ltxtquery; +SELECT '"'::ltxtquery; + +SELECT '"""'::ltxtquery; +SELECT '"a'::ltxtquery; +SELECT '"a" & "a'::ltxtquery; +SELECT '"a | "a"'::ltxtquery; +SELECT '"!tree" & aWdf@*"'::ltxtquery; + +SELECT 'a"b'::ltxtquery; +SELECT 'a!b'::ltxtquery; +SELECT 'a%b'::ltxtquery; +SELECT 'a*b'::ltxtquery; +SELECT 'a@b'::ltxtquery; +SELECT 'a{b'::ltxtquery; +SELECT 'a}b'::ltxtquery; +SELECT 'a|b'::ltxtquery; +SELECT 'a&b'::ltxtquery; +SELECT 'a(b'::ltxtquery; +SELECT 'a)b'::ltxtquery; + +SELECT '"b'::ltxtquery; +SELECT '%b'::ltxtquery; +SELECT '*b'::ltxtquery; +SELECT '@b'::ltxtquery; +SELECT '{b'::ltxtquery; +SELECT '}b'::ltxtquery; +SELECT '|b'::ltxtquery; +SELECT '&b'::ltxtquery; +SELECT '(b'::ltxtquery; +SELECT ')b'::ltxtquery; + +SELECT 'a"'::ltxtquery; +SELECT 'a!'::ltxtquery; +SELECT 'a{'::ltxtquery; +SELECT 'a}'::ltxtquery; +SELECT 'a|'::ltxtquery; +SELECT 'a&'::ltxtquery; +SELECT 'a('::ltxtquery; +SELECT 'a)'::ltxtquery; + diff --git a/doc/src/sgml/ltree.sgml b/doc/src/sgml/ltree.sgml index 3ddd335b8c..a115562361 100644 --- a/doc/src/sgml/ltree.sgml +++ b/doc/src/sgml/ltree.sgml @@ -17,14 +17,38 @@ <title>Definitions</title> <para> - A <firstterm>label</firstterm> is a sequence of alphanumeric characters - and underscores (for example, in C locale the characters - <literal>A-Za-z0-9_</literal> are allowed). Labels must be less than 256 bytes - long. + A <firstterm>label</firstterm> is a sequence of characters. Labels must be + fewer than 256 characters in length. Label may contain any character supported + by <productname>PostgreSQL</productname> except <literal>\0</literal>. If label + contains spaces, dots, or lquery modifiers, they may be <firstterm>escaped</firstterm>. + Escaping can be done with either by a preceding backslash (<literal>\\</literal>) + symbol or by wrapping the whole label in double quotes (<literal>"</literal>). + Initial and final unescaped whitespace is stripped. </para> <para> - Examples: <literal>42</literal>, <literal>Personal_Services</literal> + Examples: <literal>42</literal>, <literal>Personal_Services</literal>, + <literal>"This is a literal"</literal>, <literal>Literal\\ with\\ spaces</literal>. + </para> + + <para> + During converting to internal representation, wrapping double quotes + and escaping backslashes are removed. During converting from internal + representation to text, if the label does not contain any special + symbols, it is printed as is. Otherwise, it is wrapped in quotes and, if + there are internal quotes, they are escaped with backslashes. The list of special + symbols for ltree includes space (<literal> </literal>), backslash and double quote, + lquery and ltxtquery also require escaping <literal>|</literal>, <literal>&</literal>, + <literal>!</literal>, <literal>@</literal>, and <literal>*</literal>. + </para> + + <para> + Examples: <literal>42</literal>, <literal>"\\42"</literal>, + <literal>\\4\\2</literal>, <literal> 42 </literal> and <literal> "42" + </literal> will have the same internal representation and, being + converted from internal representation, will become <literal>42</literal>. + Literal <literal>abc def</literal> will turn into <literal>"abc + def"</literal>. </para> <para> @@ -681,11 +705,13 @@ ltreetest=> SELECT ins_label(path,2,'Space') FROM test WHERE path <@ 'Top. <title>Authors</title> <para> - All work was done by Teodor Sigaev (<email>teo...@stack.net</email>) and + Initial version was done by Teodor Sigaev (<email>teo...@sigaev.ru</email>) and Oleg Bartunov (<email>o...@sai.msu.su</email>). See <ulink url="http://www.sai.msu.su/~megera/postgres/gist/"></ulink> for additional information. Authors would like to thank Eugeny Rodichev for - helpful discussions. Comments and bug reports are welcome. + helpful discussions. Implementation of escaping syntax was done by Dmitry Belyavskiy + (<email>beld...@gmail.com</email>) directed by Teodor Sigaev. + Comments and bug reports are welcome. </para> </sect2>