Hello internals,

I'd like to discuss some issues related to escaping of characters in the ini parser (the lexer to be precise).


1. Currently double-quoted strings are processed twice: first time in the <ST_DOUBLE_QUOTES>[^] lexer rule (to get string length), and then in the zend_ini_escape_string function (to create string by processing all escape sequences). The problem is that strings are processed differently: lexer rule uses a look-behind approach to check double quote character is escaped, and zend_ini_escape_string skips escaped characters in a usual way (skip-next-char approach, like in PHP's strings parser). As a result there are the following issue:

In some cases there is no way to escape final backslash in a string, e.g. in the case of string followed by anything except of linebreak:

KEY1 = "prefix\\" ; Warning: syntax error, unexpected end of file, expecting TC_DOLLAR_CURLY or TC_QUOTED_STRING or '"'
KEY2 = "prefix\\" ACONST

I'd switch to a PHP-way and require to escape each of special chars (", $, \) in a usual (skip-next-char) way, without look-behind approach. It may lead to a backward incompatibility to a code that use a sequence like \\" (instead of \\\") to get backslash followed by double quote, but I'm not sure it's widely used in the wild (moreover, this point is not explained in PHP docs, so none can rely on such a behavior).


2. In the <ST_DOUBLE_QUOTES>[^] lexer rule, the token is processed starting from YYCURSOR position instead of yytext, as a result the first character is not taken into account. In turn, it lead to no way to escape the leading dollar character followed by open curvy brace:

KEY = "\${" ; Warning: syntax error, unexpected end of file, expecting TC_VARNAME


3. Also I'd like to note that currently ini parser doesn't support standard escape sequences (\n, \t, etc.), though from official PHP docs (https://www.php.net/manual/en/function.parse-ini-file.php) one may expect it should be supported:

; \ is used to escape a value.
newline_is = "\\n" ; results in the string "\n", not a newline character.


It seems to be easy to fix/implement above-mentioned things (I'll send a PR in the case of no disagreement).

So, how would you rate this idea on the following scale (1-5)?

1) It's not necessary at all, let's keep current ini lexer as is.
2) Let's require escaping of special characters (", \, $) only in a uniform (skip-next-char) way.
3) Above with support of \t, \n, \v, \f, \r, \e sequences.
4) Above with support of \123 (octal) and \xAB (hex) charcodes.
5) Above with support of \u{12AB} (unicode hex codepoints); actually I'd not like to implement it because I don't know how to deal with partial contents like
KEY = "\u{"
(PHP stops with "Parse error: Invalid UTF-8 codepoint escape sequence", but I'm not sure the ini parser should follow this rule).

Any comments are welcome.


Best regards,
Denis Ryabov

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to