Hi Rowan,

On Fri, Sep 2, 2016 at 7:37 PM, Rowan Collins <rowan.coll...@gmail.com> wrote:
> On 02/09/2016 11:11, Yasuo Ohgaki wrote:
>>
>> Taking care of tampered data by business logic will reduce both
>> readability and maintainability. And more importantly, make code
>> less secure because programmers tend to focus on logic
>> in model, not input data validations.
>
>
> This certainly makes sense. I guess the challenge is that in order to know
> if data has been tampered, you need to have some knowledge of the expected
> format. That expectation depends on what data you're expecting, which
> depends - ultimately - on the domain objects being modelled.
>
> More specifically, though, it depends on the interaction design - in an HTML
> context, the forms being presented. So the validation needs knowledge of the
> form controls - e.g. if a select box was shown, and the value is not from
> the known list of options, the input has been tampered with.
>
> If that's the case, the logical place to build the validation is into a form
> builder. At which point you've probably got a complex architecture in
> userland, and filter_* functions are unlikely to be a natural fit.
>
> If somebody's *not* using a library to build the form (e.g. they're laying
> out the HTML by hand), are they likely to set up the complex validation
> settings needed by the filter_* functions?

I agree HTML form is the most important input. However, apps
has to deal with other inputs like GET/COOKIE/JSON/HTTP Request
headers/Inputs from other subsystems. Form builder does not work
well for input data other than forms...

Even with great form builder, letting validation to input code has
advantages.

One issue is "data handling is better to break down into 2 parts",
logic and format. Since model's responsibility is to handle logic,
programmers try to handle logic in model. It's not bad at all with
this. However, there are many apps have poor format checks
because of this.

2nd issue is coverage. "model deals with data that is handled by
the model." Data validation covered by a model does not match with
application inputs. There are many vulnerabilities "Oops, this value
is not validated" even for data like ID. This is not limited to PHP apps.
e.g. https://groups.google.com/forum/#!topic/rubyonrails-security/ly-IH-fxr_Q

3rd issue is location. Input data validation is better to be done as
soon as possible. When application accepts input, programmers
know what the possible inputs, and could cover all inputs. i.e.
Controller is the best place for input format validation.
"Data" domain can be defined as a single definition, but making
sure it has "Valid Format" by models/libraries is harder than it seems.
If it's easy, we don't have this many vulnerabilities in PHP apps.

The best practice of input validation is
"Establish and maintain control over all of your inputs."
http://cwe.mitre.org/top25/#Mitigations

There are many possible implementations for this.
If we apply DbC to application, we have to validate external inputs
somewhere to keep contract and make code works w/o any
problems. Natural location for input validation is controller.
Internal redirects without validation could be serious bug
such as authentication bypass. Input validation can mitigate
such bug also if validation is done at controller.


BTW, I don't think everyone has to validate input very strict
manner. It is ok to validate like

<?php
// Define loose input validation

$server_def = array(
   // REQUEST HEADER
   'HTTP_ACCEPT' => $text64b_spec, // Text up to 64 bytes
   'HTTP_ACCEPT_ENCODING' => $text64b_spec,
   'HTTP_ACCEPT_LANGUAGE' => $text64b_spec,
   'HTTP_USER_AGENT' => $text512b_spec, // Text up to 512 bytes
   'HTTP_COOKIE' => $text1k_spec, // Text up to 1024 bytes
   'REQUEST_URI' => $text4k_spec, // Text up to 4096 bytes
);

$cookie_def = array(
    // COOKIE
   'PHPSESSID' => $phpsessid_spec,
  'uid' => $uid_spec,
  'last_access' => $int_spec,
 );

$get_def = array(
    // GET
   'id'  => $id_spec,
   'other_id' => $id_spec,
   'type1' => $alnum32b_spec, // Alpha numeric up to 32 bytes
   'type2'=> $alnum32b_spec,
);

$post_def = array(
    // POST
   'text1'    => $input_tag_spec, // <input type=text> default up to
                                              // 512 bytes text
   'text2'    => $input_tag_spec,
   'select1'    => $select_tag_spec, // <select> values default up to
                                                   // 64 bytes text
   'select2'    => $select_tag_spec,
   'radio1'      => $radio_tag_spec, // <radio> values default up to
                                                  // 64 bytes text
   'radio2'      => $radio_tag_spec,
   'submit'    => $submit_tag_spec,
   'textarea1k'   => $textarea1k_spec, // <textarea> values default up
                                                       // to 1K text,
allow newline
   'textarea100k' => $textare100k_spec, // <textarea> values default
                                                          // up to
100K text, allow newline
   'CSRF_TOKEN' => $alnum32b_spec, // Alpha numeric up to 32 bytes
   // and so on
);

$_SERVER = filter_require_var_array($_SERVER);
$_COOKIE = filter_require_var_array($_COOKIE);
$_GET = filter_require_var_array($_GET);
$_POST =filter_require_var_array($_POST);
?>

Since new string validator validates encoding and control chars
including newline, this validation rejects tampered inputs include
broken encoding, null char injection, newline injection and other
CNTRL char injection.

This definition is loose and weak, but we are sure "broken encoding,
null char injection, newline injection and other CNTRL char injection"
_FREE_ at least, except newline injections by textarea inputs.  It's
very important we are sure free from certain vulnerabilities even if
output code/library/subsystem is supposed to sanitize data for 100%
safety.

We still have these kind of vulnerabilities.
e.g.  I've fixed mail header injections via extra headers parameter recently.
https://bugs.php.net/bug.php?id=68776
We cannot be sure if 3rd party modules are free from encoding, CNTRL
injection attacks.

How far developers would like to validate by the input validation code
is up to developers. The more validate strictly, more secure and
increases chance to avoid hidden vulnerabilities in your code or other
people's code. i.e. Framework, library, or even PHP/subsystems like
database.

Regards,

P.S.
To people against this RFC,

How many of you are against the idea of this RFC?
(I don't think Rowan against basic idea, BTW)

--
Yasuo Ohgaki
yohg...@ohgaki.net

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to