While skim reading emails (just got back from holiday), I wanted to add...

On 15 Sep 2015, at 17:23, Anthony Ferrara <ircmax...@gmail.com> wrote:

> All,
> 
> On Tue, Sep 15, 2015 at 11:15 AM, Arvids Godjuks
> <arvids.godj...@gmail.com> wrote:
>> I fully support your effort to get this into the PHP to be part of core
>> extensions, or at least one of those that keep up with the language
>> releases.
>> This is a very good tool to have, and you can actually run it in production
>> to catch things that may slipped the stating (things happen). And it's
>> invaluable during during testing.
> 
> I'm against any form of tainting for a few reasons.
> 
> First, it's only really useful as a first-order heuristic. A failure
> does not indicate a security vulnerability, and a non-failure does not
> indicate the absence of one. An example of a false-positive would be
> if (preg_match('(^[a-z0-9]+$)', $input)) echo $input;. Taint-based
> systems would flag that as a possible XSS, where it's clearly not in
> any context.


Personally I would argue that if the `echo $input` was for printing a variable 
in a HTML construct, so it MUST go though one of the HTML encoding functions, 
because that regexp could be changed later.

e.g. an inexperienced developer being told that usernames on a website can now 
support angle brackets, so "just edit the regexp for the username validation".

This is keeping in mind that the regexp is now usually kept separate from the 
view code (e.g. MVC style)



> An example of a false-negative would be the following:
> 
>    <a href=<?php echo htmlspecialchars($input); ?>>Bar</a>. If the
> input is "foo onclick=dosomethingbad();", an XSS is still executed.


True, but without htmlspecialchars, it will ALWAYS be bad... so while a 
tainting system won't be able to pick up all problems, it can at least identify 
more mistakes than it not existing (percentage wise, well that will depend on 
the programmer).

Maybe you could think of it like having a second pair of eyes looking over your 
shoulder, politely asking you if you forgot something (a bit like how pair 
programming helps improve code quality).



> 
> Another example of a false negative would be:
> 
>    $query = "SELECT * FROM foo WHERE bar = " .
> mysqli_real_escape_string($c, $input) . ";";
> 
> A false positive is potentially OK. A false negative is not.
> 


All of the examples have been about using pg_escape_identifier(), because that 
adds the quote marks... mysqli_real_escape_string by itself would not be enough 
to say that it safe for an SQL string :-)



> Second, it will encourage the improper pattern of "sanitize" functions
> to de-taint input. These are functions which call
> `mysqli_real_escape_string(htmlspecialchars(whatever(etc($input))))`.
> No matter what the context the variables are used in. This doesn't
> improve security (and in many cases can demonstrably harm it). It just
> encourages people to work around it. And that ignores that there's an
> `untaint()` function that they can call on any variable to "silence
> the errors".


I don't think a sanitize() function would be implemented though, as it would 
make a mess of variables being passed though more than once... resulting in 
things like double html encoding (e.g. &amp;amp;), and these stand out much 
better than the current sanitize() functions which do things like silently 
stripping "bad" characters like double quotes and less than signs.

If anything, a proper tainting system would help us move away from these 
horrible functions (says he having had to remove a couple before now).



> Third, it ignores context. This is related to the first two, but I
> think is a separate concern. An example from the taint RFC
> (https://wiki.php.net/rfc/taint) is the shell-execution. If the
> variable is used in the context of command, one escape function is
> needed. If it's used as an argument, another is needed. Detecting
> which is not something that's trivial for a language-level taint
> function. And calling the wrong function defeats the purpose. So if we
> can't detect the context, we can't reliably say whether it's safe or
> not.


Not sure I understand, if anything this gives the strings a context?

Rather than every variable being a simple string (number, etc), they start off 
being plain text (or a string defined as a constant in PHP), and their escaping 
is checked in the context of the function that was called (e.g. exec needing 
escapeshellarg).



> In the hands of an experienced engineer/security researcher, taint
> systems can be a really handy tool to help find possible issues. In
> the hands of a junior engineer, it can actually make things worse
> because it encourages them to just "silence the errors".


While I originally hoped that it would be enabled by default, I think for now 
it would have to be off by default, and probably only used by "experienced 
engineer/security researchers".

At the moment, static analysis seems to be the closest alternative I can find 
(and they are pretty awful).



> Considering that it's not a reliable tool, and is only a heuristic, I
> would suggest that this be implemented as an extension, and not in the
> core of a programming language.


Maybe, but I think with some discussions, having it part of the core language 
would help get all the other extensions to use it (if needed), and help improve 
viability.

See it like having Notices present, but off by default (which they are).

The other positive is that distros such as Debian / RedHat will then ship with 
it by default (disabled), rather than having to ask/hope that they will provide 
a version with it (keeping in mind that they provide security updates without 
needing to re-compile).


> With that said, if there are engine hooks necessary to implement this
> feature better as an extension, I'm all for adding those hooks. I'm
> just against adding something to the language that won't actually be
> able to accomplish what it promises.
> 
> Escaping is always context-sensitive, in every case. Tainting as has
> been proposed in the past doesn't (and literally can't in the general
> case) know the context. Therefore it can't reliably know the correct
> way to escape.


I think it can, but I agree that there are some edge cases that it won't be 
able to catch, such as:

   $url = $_GET['u']; // ?u=javascript:alert(document.cookies);

   echo '<a href="' . htmlentities($url) .'">Cookies</a>';


> Consider the case of addslashes() and magic_quotes_gpc. The reason
> that they are insecure is that they don't know the context that the
> variable is used in. This leads to character set issues, as well as
> cases where backslashes enter contexts they don't escape (like HTML
> output). We learned years ago that context-insensitive escaping
> doesn't work and is literally bad. Let's not make the same mistakes
> again (even if it's for the correct reasons this time).


I am repeating myself now, but yes, magic_quotes_gpc did not know about the 
context those variables were going into... tainting on the other hand is 
checking the variables, which are going into a known context, have been escaped 
(with a couple of possible issues, but they can be addressed though other 
methods).

Craig

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to