On 15-Dec-06, at 3:14 PM, Wietse Venema wrote:

This is a proposal to add basic Perl/Ruby like tainting support to
PHP: an option that is turned off by default, and that programmers
may turn on at runtime to alert them when they make the common
mistake of using uncleansed input with include, echo, system, open,
etc.  This would work with unmodified third-party extensions.

I doubt it is plausible to make it work entirely without touching external extensions that those extensions may be changing behavior of data from tainted to un-tainted and vice versa.

Taint support is not a sandbox; a malicious PHP script can still
open a pipe to a shell process and feed uncleansed commands to it.
Taint support can be an ingredient to build a sandbox, but that
involves lots more. See for example the Ruby reference at the end.

Sounds awefuly like yet another safe_mode, something that proclaims security, yet being unable to provide it.

Of course when overhead is low enough, people might want to turn
on taint checks in production, to implement a multi-layer defense.
Wise people know that no single layer provides perfect protection.
People already do this with other scripting languages.

Unlikely to ever be the case, the overhead of taint modes is generally quite significant.

- Education: automatic cleansing systems don't make programmers
  aware that network data is inherently untrustworthy. Instead,
  they teach the exact opposite: don't worry about data hygiene.
  This of course means they will get bitten elsewhere anyway.

Most people program not to learn how, but to solve problems. Which is why automatic filtering has been the holy grail of security as it allows developers to avoid thinking about input validation beyond the initial setup and move on with their lives.

- Expectation: automatic cleansing systems have to be perfect. If
  the safety net catches some but not all cross-site scripting or
  SQL injection attacks, then the system has a security hole and
  people lose confidence. This gives security a bad reputation.

Same argument can be made about taint mode, judging by Perl and Ruby where there are tricks to bypass it, same argument applies.

- Overhead: as strings are sliced, diced, and tossed around, the
  automatic cleansing safety net has to keep track of exactly which
  characters in a substring are derived from untrusted input, and
  which characters are not, so that the safety net can later recognize
  malicious content in the middle of html/shell/sql/etc.  commands.

If you look at filter, there is no tracking of malicious chars, the data is simple cleansed of them or rejected all together, this is a one time event.

- More overhead: special-purpose code is needed in all functions
  and all primitives that execute html/shell/sql/etc.  commands.
  This code is needed because each context has a different definition
  of what is "malicious" content in the middle of a request.

That's why you can use RAW mode and filter the data when necessary.

Compared to this, the run-time overhead of maintaining and testing
taint bits in PHP is miniscule, if my experiences with the prototype
are meaningful.

I am highly skeptical regarding this claim.


- Each ZVAL is marked tainted or not tainted (i.e.  we don't taint
  individual characters within substrings). Black and white is all.
  In some future, someone may want to explore the possibility of
  more than two shades. But not now.

That means an additional element to a struct that has thousands of instances in most scripts, this will be the first overhead caused by the memory footprint increase.

- Primitives and functions such as echo, eval, or mysql_query are
  not allowed to receive tainted input. When this happens the script
  terminates with a run-time error.  It is a bad idea for software
  to continue after a security violation.

You would need to go through some 5,000+ functions that PHP offers and determine which one can and cannot receive tainted data, something that virtually guarantees things will be missed, bring us back to the safe_mode/open_basedir problem.

- PHP propagates taintedness across expressions.  If an input to
  an expression is tainted, then the result of that expression is
  tainted too. There are exceptions to this rule: these are called
  sanitisers, as discussed next.

That goes counter to your original point that extensions do not need to be taint aware, what you propose would require adjustment of nearly every single extension. The additional tainted, not-tainted checks will add further overhead.

- The PHP application programmer untaints data by explicit assignment
with an untainted value. For example, the result from htmlentities()
  or mysql_real_escape_string() is not tainted. People could apply
  the wrong sanitizer if they really want to. Remember, the purpose
  is to help programmers by telling what data needs cleansing.  It
  is up to them to make the right decision.  If we wanted to force
  the use of the "right" sanitizer then we would need multiple
  shades of untaintedness. This would not be practical.

Again, many functions have different behaviors etc... Let's take an example htmlspecialchars() is great against XSS but does nothing for exec(), so if you htmlspecialchars a string then pass it to exec, it thinks that the data is non-tainted and executes it resulting in command injection.

Overall, as it stands I do not believe that this is a good idea and as is my vote would be -0.5 on its inclusion into PHP.

Ilia Alshanetsky

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to