Re: [PHP-DEV] Re: [RFC][VOTE] Add validation functions to filter module

Yasuo Ohgaki Sat, 20 Aug 2016 00:17:41 -0700

Hi Stas,

On Thu, Aug 18, 2016 at 3:54 PM, Stanislav Malyshev <[email protected]> wrote:
>> Even when there is no JavaScript nor HTML5 forms, input validations
>> can be done. It's matter of definition of "valid inputs" for <input
>> type="text" name="var" />. If page encoding is UTF-8, web browsers
>> must return response by UTF-8 encoding. (Unless other encoding is
>
> I think you're still missing my point. The point is that it is
> absolutely irrelevant what browser might or might not do, since PHP does
> not have any means to know if browsers even exist. PHP doesn't talk to
> browser, it talks to HTTP channel (provided we're in webserver
> scenario), what's on the other end is unknown and irrelevant. So there's
> no point discussing browsers.


It's possible to design web pages/services to "unknown clients", but
it's exceptional cases.

Exceptions do not negate best practices. If there are cases that
should be handled exceptionally, it should be applied to that case
only, not in general.

Almost all systems have intended clients. If protocol is HTTP/HTTPS,
developers may reject strange data that cannot be right for
HTTP/HTTPS. Even higher level than PHP does this. i.e. HTTP servers
will rejects malformed and/or prohibited request and terminates
execution. Web Application Firewall does more fancy things and
terminates connection. (It does not even allow to reach web server) If
web apps check their requirements and terminate request does not
fulfill its requirements wouldn't matter at all.

Those who like WAF(Web Application Firewall), they may use WAF to
check more web server apps inputs. i.e. WAF filters are designed to
check inputs that attack signature and Web Apps does not
validate/check, in general.  IMHO, use of WAF is more burden and
costly than the input validation that I'm proposing.

>
>> We recently added number of
>> php_error_docref(E_ERROR, "Cannot process too large data");
>> in PHP core to avoid possible memory destruction attacks.
>
> We added it because we didn't have choice. PHP does not have generic
> error mechanism that allows to fail an arbitrary function and still
> continue execution. It's because PHP is highly complex C code and C is
> not the most friendly language out there. Your app is not in C, so it
> can do it differently.
>
> If you talk about such situations, fine, but it's not input validation -
> it's limitation of the environment (since PHP can't support arbitrary
> length string). If your application has such limitations - fine, but it
> would be application-defined and will not apply for most cases of input
> validation.

Whether it is input or output validation is irrelevant. "Programs
terminate for insane input/output", like no available memory(PHP),
broken/insane HTTP/HTTPS request(HTTP server), impossible/invalid
inputs to Web apps(WAF).

My point is "program (or even connection) terminates" everywhere when
there is invalid data.

Web application developers have right to define "valid" inputs. ("have
right" does not mean "can do anything") PHP script termination for
invalid input is just one of terminations. It's nothing special.

>
>> Broken char encoding shouldn't came from legitimate users. Text
>> contains CNTRL chars from <input type="text" name="var" /> shouldn't
>> come from legitimate users. 1MB data from <input type="text"
>> name="var" /> shouldn't come from legitimate users. Numeric database
>> record ID that is set by app shouldn't contain anything other than
>> digits. And so on.
>
> I think you are mixing abnormal situations due to physical limitations
> of software (like memory limits, etc.) with business logic. Numeric
> format validation and size limits are clearly business logic. Encoding
> may be not, depending on what the input is and used for.

I would impose certain limits in "the input validation", but if
program must return nice response for any request, then it must be in
business logic. I agree that. It's your rule after all.

>
>> Broken char encoding (Accept only valid encoding)
>> NUL, etc control chars in string. (Accept only chars allowed)
>> Too long or too short string. e.g. JS validated values and values set
>> by server programs like <select>/<input type=radio>/etc, 100 chars for
>> username, 1000 chars for password, empty ID for a database record,
>> etc. (Accept only strings within range)
>
> These all fine filters/validators, and may be very useful in many
> situations. What I still don't understand is insistence of application
> dropping everything and exiting when one of them fails. We already have
> sanitization/filtering infrastructure, we can add new filters and flags
> - what I don't understand, why we need parallel infrastructure which
> seems to be only different by an unhelpful feature of crashing each time
> it sees something unexpected. Am I missing something?

I think your premise is "Show nice error message for any errors,
proceed as normal case".  (Handle invalid/insane data just like mistakes)

My premise is "Shouldn't show nice messages to attacker, terminate as
abnormal case". (Treat them as attack or serious system bug)

It's design choice. Either way is possible.

>
>> How to deal with bad inputs.
>>   - You seem you would like to treat as normal input.
>
> No, you didn't understand. I would like to treat is as erroneous input,
> but not stop the application immediately, but return error status to the
> business logic and let it sort things out.

Now we are close to it!
Premise differs so opinion/view differs.

My premise is "Client and server have certain rules. Client inputs do
not follow rules(requirements) should be treated abnormal cases and
shouldn't be treated by business logic". Please note that

  - Valid input != logically correct or no mistakes

A rule could be "an integer may be any valid integers", but developer
may/can impose that an int value must be between 0 to 120, for
instance. Age 300 can't be true for human age, but if any integer is
allowed, this is valid input.

>
>> When plain <input> is used, users may type in any valid UTF-8 char by 
>> mistake.
>> For example, this wouldn't happen for date field, but autocomplete may
>> fill my name "大垣靖男" to name field that supposed to contain alphabets
>> only.
>
> If the software is properly internationalized (like my email client)
> there's absolutely nothing wrong with this string. If it is not, it
> should check that the text matches its expectations - that's part of
> business logic.

Error checks should be treated by business logic differs by
rules/requirements that developers can impose to client. Since it
depends on developer defined rules/requirements, let's talk about what
kind of rules/requirements can be defined.

>
>> If developers try to validate "all inputs", validation in MVC model is
>> not efficient nor reasonable. It does not make sense to validate
>> browser request headers in db model, for example. Ideally, input
>> validation is better to be done as fast as possible to maximize the
>> mitigation effect.
>
> If you use browser headers, you validate them. If you don't use them, no
> point validating them, of course, since they are not your inputs.

It's ok to design that way.

To maximize Input validation mitigation effect, developers are advised
to validate "all inputs" regardless of usage in business logic or
output code. It may be used in the future or may be used already by
some code you don't realize.

Let's talk about what could be validated because things cannot be
validated at input code do not belong to "the input validation"
anyway.

We know there are many inputs that could be validated by input code, don't we?

Regards,

--
Yasuo Ohgaki
[email protected]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Re: [RFC][VOTE] Add validation functions to filter module

Reply via email to