Hi Stas,

On Thu, Aug 18, 2016 at 10:34 AM, Stanislav Malyshev
<smalys...@gmail.com> wrote:
>> I think you wrote your JavaScript code to impose certain format for "date",
>> "phone", "zip", etc. It's not my JavaScript code, but your JavaScript code
>> that defines output of browser to your PHP web apps.
>
> There are a lot of use cases that don't have Javascript (or browser, for
> that matter) frontends. Not that it should matter, as browser is not
> under your code's control, it's under user's control, and you never know
> on the PHP side if such thing as browser exists at all.

Even when there is no JavaScript nor HTML5 forms, input validations
can be done. It's matter of definition of "valid inputs" for <input
type="text" name="var" />. If page encoding is UTF-8, web browsers
must return response by UTF-8 encoding. (Unless other encoding is
explicitly specified. Or some very very old browsers that can return
only SJIS, etc) 1MB input for <input type="text" name="var" /> is
possible. However, apps should accept 1MB data from <input type="text"
name="var" /> should be rare if not none. 1KB would be far too large
for almost all apps. 100 bytes may be too large and acceptable to stop
normal program execution.

>
>> Accept means that allow program to process input data. (continue execution)
>
> Then we disagree here. I strongly object to the notion that the program
> should stop execution at any unexpected data. This is only marginally
> better than crashing, and is not very helpful behavior. Modern
> application is expected to handle data in a more intelligent manner.

We recently added number of
php_error_docref(E_ERROR, "Cannot process too large data");
in PHP core to avoid possible memory destruction attacks.

Why not for apps written by PHP?

Broken char encoding shouldn't came from legitimate users. Text
contains CNTRL chars from <input type="text" name="var" /> shouldn't
come from legitimate users. 1MB data from <input type="text"
name="var" /> shouldn't come from legitimate users. Numeric database
record ID that is set by app shouldn't contain anything other than
digits. And so on.

If this kind of data is sent to apps, something like this message may
be displayed "Invalid inputs are detected. This incident is reported
to administrator to investigate the cause." and finish the page.

>
>> If your JavaScript date picker uses "YYYYMMDD" format (date like
>> 20160817) for a date, anything other than "YYYYMMDD" format is
>> attacker tampered inputs.
>
> You keep returning to Javascript. What I am asking you to consider is
> that we're not talking about Javascript, we are talking about PHP, and
> PHP has no relation to Javascript date picker. Some apps use Javascript
> date pickers, true, but there is a whole world of applications that do
> nothing of the sort. Javascript does not add much to the picture here.

I agree that there is data that cannot be validated by input
validator. File input is the one. However, there are many inputs that
can be validated and can terminate normal execution like previously
mentioned.

>
>> It may be considered "valid input" means expected inputs from _legitimate_
>> users. Anything other than "valid input" should not be accepted because
>> they come from non legitimate users. i.e. attackers.
>>
>>  - Broken encoding
>>  - CNTRL chars
>>  - Bad format ( YYYYMMDD is the format for this case )
>>  - Too long or short ( Exactly 8 chars is the length for this case )
>>  - and so on
>>
>> are examples of invalid inputs.
>
> I think this has a smell of blacklisting, which is always almost wrong
> approach to dealing with data filtering/validation. Blacklists almost
> always lose to whitelists. If you have a whitelist filter that validates

I completely agree!!
Programmer must think of whitelist, not blacklist.

That's the reason why I used bold for whitelist way in
https://wiki.php.net/rfc/add_validate_functions_to_filter#secure_coding_basics

Broken char encoding (Accept only valid encoding)
NUL, etc control chars in string. (Accept only chars allowed)
Too long or too short string. e.g. JS validated values and values set
by server programs like <select>/<input type=radio>/etc, 100 chars for
username, 1000 chars for password, empty ID for a database record,
etc. (Accept only strings within range)

> the data, you don't need to worry about chars, lengths and such
> separately. However, there's nothing here that requires the whitelist
> filter would bring down the app on failure. It should tell the business
> logic "this string you gave me is not a valid date" and business logic
> should deal with it. There's nothing special here in encodings, control
> chars, etc. that I can see that needs any special handling.
>
>> It may increase your work, but you'll get less risks in return.
>
> I don't see how. I can write intelligent code that produces helpful
> messages and be the same - in fact, more, see above about whitelisting -
> robust than blackbox code that explodes on bad inputs.

I agree that blacklist is erroneous. It can be broken easily. My RFC
description may not be good enough to emphasis on this, but I totally
agree that blacklist should be avoided whenever it is possible.

The difference between us is:

How to deal with bad inputs.
  - You seem you would like to treat as normal input.
  - I would reject bad inputs that shouldn't came from legitimate users.

You can do your way, since I made exception could be optional. Missing
part is getting useful error info when something wrong in inputs. I
can add function that retrieves error info. (I've removed it in favor
of exception object. So it can be added again easily)

>
>> The input validation only reject invalid input.
>>
>> If you use plain <input> for "date", then you should consider any valid
>> UTF-8 without CNTRL chars up to 100 char or so, not "YYYYMMDD".
>> (Assuming UTF-8 is the encoding)
>
> But why? If I just check for YYYYMMDD I automatically get all invalid
> UTF-8 etc. rejected, without even thinking about it.

When plain <input> is used, users may type in any valid UTF-8 char by mistake.
For example, this wouldn't happen for date field, but autocomplete may
fill my name "大垣靖男" to name field that supposed to contain alphabets
only.

>
>> Software design is upto developers. There are many softwares that do
>> not follow best practices. Nobody enforce to use the validator as I
>> explains. It's okay to me this is used by only users who need. As I
>> mentioned, ISO 27000/ISMS requires this kind of validations, not few
>> users may need this.
>
> I'm not sure what ISO says there, but I'm pretty sure ISO does not
> specify which exactly code you should use to validate your inputs. The
> objections are not about validating inputs as a concept, nobody would
> object to that, it's to specific model of doing this that you propose -
> namely, doing two separate validation passes, doing blacklisting and
> bailing out on validation failure. At least I would not do input
> validation in this manner.

No it does not. It explains how/what to check input.

If developers try to validate "all inputs", validation in MVC model is
not efficient nor reasonable. It does not make sense to validate
browser request headers in db model, for example. Ideally, input
validation is better to be done as fast as possible to maximize the
mitigation effect.

Defense in depth (multiple layers of defense) is common technique in
security. Implementation is developers choice. Ideal way does not have
to be implemented.

I'll revise patch so that it would be more useful for user input error
checks. Then, new features will be able to use for wider purposes.

I need to return useful error messages. Return value is already used.
Options are
  - Add reference parameter that will hold error info.
  - Add function that retrieves the last error info.

If anyone have better idea, it will be appreciated.

Regards,

--
Yasuo Ohgaki
yohg...@ohgaki.net

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to