Hello.
I have created RFC about context-dependent escaping operator.
https://wiki.php.net/rfc/escaping_operator

Initial discussion was here: http://marc.info/?t=146619199100001


At first, I wanted to add a call of special function like
escaper_call($str, $context), which performs html-escaping by default and
can be replaced with a separate extension for extended work with contexts.
But then I figured out better variant.


Main idea.

Operator has the following form:

<?* $str ?>
<?* $str, 'html' ?>
<?* $str, 'js | html' ?>

Both expressions can be any type which can be converted to string. Second
expression is optional.

I changed '~' sign because it is not present on keyboard layouts for some
european languages. And also it does not give any error on previous
versions of PHP with short tags enabled, because this is recognized as
bitwise operation.


Operator is compiled into the following AST:

echo PHPEscaper::escape(first_argument, second_argument);

Don't you forget that we already have special operator for one function?
Backticks and shell_exec(). New operator is compiled very similar to it.


There is a default implementation of the class 'PHPEscaper'. It has 4
static methods:

PHPEscaper::escape($string, $context = 'html');
PHPEscaper::registerHandler($context, $escaper_function);
PHPEscaper::unregisterHandler($context);
PHPEscaper::getHandlers();

Method PHPEscaper::escape($string, $context) splits $context by '|'
delimiter, all parts are trimmed, and then calls registered handler for
every context in a chain.
'html' is default value for context, and it has special handling.
If there is no handler for 'html' context, it calls
htmlspecialchars($string, ENT_QUOTES | ENT_SUBSTITUTE);


We can use it like this:

<?php
    // anywhere in application
    PHPEscaper::registerHandler('html', [MyEscaper, 'escapeHtml']);
    PHPEscaper::registerHandler('js', function($str) { return
json_encode($str); });
?>
<?* $str, 'js | html' ?>


And even more.
In the AST, 'PHPEscaper' is registered as not fully qualified name
(ZEND_NAME_NOT_FQ).
This allows us to use namespaces and autoloading:

<?php use MyEscaper as PHPEscaper; ?>
<?* $str, 'js | html' ?>

MyEscaper::escape($str, 'js | html') will be called.


In this way we can have autoloading, multiple contexts, HTML escaping by
default, and full control and customization.
This is not an operator for one function, just there is one default
implementation.

My first goal is to draw the attention on the problem with a security and
HTML escaping. Exact implementation is secondary thing.

This small change can really improve a security and make development easier
in many applications.


How do you think, maybe also it would be good to create some official poll
about this feature and to know community opinion about it?

Reply via email to