**This is concept for RFC improve filter extension further**

Aim of this RFC is to give filter extension small nudge.

# Introduction:

## Status quo

### Status quo: input "variables" source control

Author of this RFC is one of those "three weird people on the internet", 
who (for his own php projects) almost completely disable "automagic"
request variable registration in php.ini, by setting:
<code>
  variables_order = "CS"
  request_order = ""
</code>
or even:
<code>
  variables_order = "S"
  request_order = ""
</code>
depending on deployment type. That way, most "magic" variables are not
even created. This roughly equates to "by default deny" principle.

For example, because REQUEST source for filter extension is not even 
implemented yet, it is always crystal clear, which "input source"
each variable is fetched from. No programmer on team can somehow
mix or swap two variable's sources, unless consciously trying.

Some processing time is also "shaved off", because interpreter is not doing 
any (useless) string processing for variables, that don't even make sense for 
given request handler (php script behind given URL). Each request handler is
expected to be fully aware of variables it requires for further processing.

### Status quo: input variable filtering

All variables entering the php application are filtered through 
filter_*() calls, making great use of this extension. 

For majority of input values, default filters are sufficent and are
heavily used, but for some variables, specialty filters are needed. 
For those, currently provided FILTER_CALLBACK filter is suboptimal, 
as it occupies whole 'options' field by callable, "breaking 
generalisation" of filter API and unability to be passed custom 
options.

### Status quo: 'options' confusion

Because of function calls parameter name '$options' and $options
array/object field 'options' key/property, there is some confusion
among users, on how to "construct" filters.

### Status quo: array/object duality

In php, both array and object (properties) are essentially built
around core structure of HashTable. From simplistic "viewpoint",
object can be seen as an "glorified" array. This is advantage.
It allows one to pass object as parameter to functions that
accept arrays. 

Although php is equipped with "interface" and "trait", both are 
orthogonal and quite useless when object is used "as array".
"interface" is missing machinery to express public poperties,
"trait" is not standalone entity on it's own. 

However none of this is problem, if we consider object as special 
case of array. When object is used as array, in php, one can 
"intuitively" assume structural like (property based) type
system. This is feature.

One just needs to pass an object, without having to muck around
with interfaces and whatnot. If object has required public 
properties set, it is processed as such, if not, it's same, as 
if array, with keys missing or keys having null values, was provided.

Magic properties, and others 'specials', are not usually processed.
By defining class one can easily enforce required fields to be
existing, but null. Many array consuming APIs can consume object
of any class. Thanks to this, these APIs ending up pretty general.

Unfortunately current filter extension doesn't allow use of objects 
(instead of array) in all contexts.

# Proposed improvements:

1. Introduction of filter 'definition' concept and parameters cleanup.
2. Introduction of new 'callback_extended' filter 
   (while keeping compatibility with old code)
3. Ability to consume both arrays and objects in 'definition' 
   parameters.

## 1. Introduction of filter 'definition' concept and cleanup.

Ambigous parameter $type is renamed to $input_source. 
Parameter $filter is renamed into $filter_or_definition. 
Parameter $options is renamed to $definition.
For each function call where 'int $filter' value is currently 
passed, new logic for processing $filter_or_definition
is employed:

Parameter $filter_or_definition can be of type (int), (array)
or (object).

Filter "usability" validation algorithm is as follows:
  1. check if $filter_or_definition is an (int) 
     - if yes, it is expected to be filter_id
       - if $definition is passed it is processed
         same way as $options were before
  2. check if $filter_or_definition is (array|object)
     - if yes, check it for property/key (int) $filter
       - if yes, filter_id is extracted from 
         $filter_or_definition->filter
         property(or key) and filter "definition" is 
         considered "usable"
         - internally $definition is made point to 
           $filter_or_definition and $definition
           from function call list is ignored.
  3. for everything else, function call fails

Further processing continues as currently, extracting flags
and so on. Modification of C function code is minor.
Because none of these parameters is named $options anymore,
confusion is lessened.

Function signatures thus became:
<code>

filter_has_var(int $input_source, $variable_name)

filter_input(int $input_source, 
             string $variable_name, 
             $filter_or_definition,
             $definition = null)


filter_var($variable,
           $filter_or_definition,
           $definition = null)

filter_input_array(int $input_source,
                   $definition = null,
                   $add_empty = true)

filter_var_array(array $data,
                $definition = null,
                $add_empty = true)

filter_list()

</code>


filter_input call now has following possible invocations:
<code>

filter_input(INPUT_GET, 'MY_VAR', 
             FILTER_VALIDATE_BOOLEAN);

filter_input(INPUT_GET, 'MY_VAR', 
             FILTER_VALIDATE_BOOLEAN,
             FILTER_NULL_ON_FAILURE]);

            
filter_input(INPUT_GET, 'MY_VAR',
             ['filter'=>FILTER_VALIDATE_BOOLEAN,
              'flags'=>FILTER_NULL_ON_FAILURE,
              'options'=>['default'=>false]
             ]);
               
$dfn = new stdClass();
$dfn->filter = FILTER_VALIDATE_BOOLEAN;
$dfn->flags = FILTER_NULL_ON_FAILURE;
$dfn->options = new stdClass();
$dfn->options->defaul = false;

filter_input(INPUT_GET, 'MY_VAR', $defn);
</code>

## 2. Introduction of new 'callback_extended' filter

New filter FILTER_CALLBACK_EXTENDED:"callback_extended"
is introduced. It expects 'definition' defined as such:
<code>

$defn = [
  'filter' => (int) FILTER_CALLBACK_EXTENDED,
  'flags' => (int) FILTER_NULL_ON_FAILLURE,
  'callback' => (callable) $callable_ex,
  'options' => [
    'default' => 42,
    'min'=> -1,
    'max'=> 64,
  ],
];
</code>

This filter has new id (FILTER_CALLBACK_EXTENDED=FILTER_CALLBACK++).

Instead of "abusing" field 'options', for storing callable, it
inspects 'definition' itself, searching for new field 'callback',
that is "outside" of 'options' subcomponent. Field 'options' is
passed as is, as second parameter to $callable_ex callable.

Thus callable prototype call looks like this:
$filtered_value = $callable_ex($value, $options)

This design **immensely(!)** simplifies development of
'per input variable type' configurable callback filters.

It also allows user to tie everything related to variable
filtering, validation and sanitization with single unified
API interface provided by filter extension.

In essence, in case of FILTER_CALLBACK_EXTENDED, filter identity
is unique value, actually composed from two subvalues:
filter_id (FILTER_CALLBACK_EXTENDED) and
$callback callable signature.

Besides for allowing "huge from" processors using nested
$definition array like in case of filter_input_array() API,
it allows other, much more flexible uses.

For example, if using objects to store filter 'definitions',
highly, expressive, "composable" and reusable "filter libraries"
can be constructed:
<code>

$def_v1 = (object) [
  'filter' => FILTER_CALLBACK_EXTENDED,
  'callback' => $filter_v1_handler,
  'options' => (object) ['x'=>1,'y'=>2],
];
$def_v2 = (object) [
  'filter' => FILTER_CALLBACK_EXTENDED,
  'callback' => $filter_v2_handler,
];
$def_s1 = (object) [...];
$def_s2 = (object) [...];

$usr_validating_filters = [UVFLT_1=>$def_v1, UVFLT_2=>$def_v2];
$usr_sanitizing_filters = [USFLT_2=>$def_s1, USFLT_2=>$def_s2];

filter_input(INPUT_GET, 'MY_VAR', $usr_validating_filters[UVFLT_1]);
</code>

By moving callack's callable storage outside of 'options' component,
proper semantic separation is achieved, and sensible hierarchy of filter
'definition' is maintained, while at the same time, callback is allowed
much needed, invocation customisations.

Actual implementation is relatively straight forward. Requiring only one
new internal function addition, while reusing much of the filter extension
machinery already present (with slight modification).

## 3. Ability to consume both arrays and objects in 'definition' 
    parameters.

Extension code was reread, and what could be called 'definition' processing,
was modified, to allow both array and object consumption, by means of
HASH_OF() macro.

# Conclusion

Experimental implementation seems pretty usable, passing all current
ext/filter/tests (with small modifications due to modified semantics).

More experiments are to be done, especially stress testing memory access
for usage and corruption. So far debug+maintainer-zts builds
have not found problems.

Logic, usability and compatibility was prioritised over performance.

Still, some small performance gains might be actually observed,
as prameter parsing was converted to FAST_ZPP. Especially
for high cadence of successive filter_has_var(). 
However no effort was done on this front.

Compared to advantages gained, code changes are relatively minor.

Attempt was made to maintain backwards compatibility, when using
FILTER_CALLBACK, although users will be suggested to "upgrade" to
FILTER_CALLBACK_EXTENDED.

Hidden errors in legacy scripts, due to change of $filter
(now $filter_or_definition) prameter processing, are not evaluated,
and are considered severe bugs anyway. $filter should have
been an (int).

Reflection API using sniffers will break (if they expect certain
filter API layout), but that is expected (or should be expected),
by reflection consumers and thus is not considered a problem.

Nobody should be, probably, using Reflection API to drive 'decision
tree' in production code, invoked several hundreaths (or thousands)
requests per second.

Should this RFC pass, filter documentation is going to be updated
to match new semantics.

---------------------------------------------------------------------------

By posting this draft, I am asking for comments.

Should this draft be considered worth inclusion among RFCs, I am asking
for karma, to be able to add it into wiki.

After that, git fork will be provided, for reviewers, to evaluate the code.

After successful review, I am asking for final voting.

My intended upstream inclusion target window is "before" PHP_7.2.

However I am not interested into speed of inclusion as much,
as I am in sensibly improving quality of (awesome) filter extrension.

It would be great, if it went through, given advantages it has for userland
consumers.

Thank you for reading and consideration, in advance.

  eto

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to