On 15-Dec-06, at 3:14 PM, Wietse Venema wrote:
This is a proposal to add basic Perl/Ruby like tainting support to
PHP: an option that is turned off by default, and that programmers
may turn on at runtime to alert them when they make the common
mistake of using uncleansed input with include, echo, system, open,
etc. This would work with unmodified third-party extensions.
I doubt it is plausible to make it work entirely without touching
external extensions that those extensions may be changing behavior of
data from tainted to un-tainted and vice versa.
Taint support is not a sandbox; a malicious PHP script can still
open a pipe to a shell process and feed uncleansed commands to it.
Taint support can be an ingredient to build a sandbox, but that
involves lots more. See for example the Ruby reference at the end.
Sounds awefuly like yet another safe_mode, something that proclaims
security, yet being unable to provide it.
Of course when overhead is low enough, people might want to turn
on taint checks in production, to implement a multi-layer defense.
Wise people know that no single layer provides perfect protection.
People already do this with other scripting languages.
Unlikely to ever be the case, the overhead of taint modes is
generally quite significant.
- Education: automatic cleansing systems don't make programmers
aware that network data is inherently untrustworthy. Instead,
they teach the exact opposite: don't worry about data hygiene.
This of course means they will get bitten elsewhere anyway.
Most people program not to learn how, but to solve problems. Which is
why automatic filtering has been the holy grail of security as it
allows developers to avoid thinking about input validation beyond the
initial setup and move on with their lives.
- Expectation: automatic cleansing systems have to be perfect. If
the safety net catches some but not all cross-site scripting or
SQL injection attacks, then the system has a security hole and
people lose confidence. This gives security a bad reputation.
Same argument can be made about taint mode, judging by Perl and Ruby
where there are tricks to bypass it, same argument applies.
- Overhead: as strings are sliced, diced, and tossed around, the
automatic cleansing safety net has to keep track of exactly which
characters in a substring are derived from untrusted input, and
which characters are not, so that the safety net can later recognize
malicious content in the middle of html/shell/sql/etc. commands.
If you look at filter, there is no tracking of malicious chars, the
data is simple cleansed of them or rejected all together, this is a
one time event.
- More overhead: special-purpose code is needed in all functions
and all primitives that execute html/shell/sql/etc. commands.
This code is needed because each context has a different definition
of what is "malicious" content in the middle of a request.
That's why you can use RAW mode and filter the data when necessary.
Compared to this, the run-time overhead of maintaining and testing
taint bits in PHP is miniscule, if my experiences with the prototype
are meaningful.
I am highly skeptical regarding this claim.
- Each ZVAL is marked tainted or not tainted (i.e. we don't taint
individual characters within substrings). Black and white is all.
In some future, someone may want to explore the possibility of
more than two shades. But not now.
That means an additional element to a struct that has thousands of
instances in most scripts, this will be the first overhead caused by
the memory footprint increase.
- Primitives and functions such as echo, eval, or mysql_query are
not allowed to receive tainted input. When this happens the script
terminates with a run-time error. It is a bad idea for software
to continue after a security violation.
You would need to go through some 5,000+ functions that PHP offers
and determine which one can and cannot receive tainted data,
something that virtually guarantees things will be missed, bring us
back to the safe_mode/open_basedir problem.
- PHP propagates taintedness across expressions. If an input to
an expression is tainted, then the result of that expression is
tainted too. There are exceptions to this rule: these are called
sanitisers, as discussed next.
That goes counter to your original point that extensions do not need
to be taint aware, what you propose would require adjustment of
nearly every single extension. The additional tainted, not-tainted
checks will add further overhead.
- The PHP application programmer untaints data by explicit assignment
with an untainted value. For example, the result from
htmlentities()
or mysql_real_escape_string() is not tainted. People could apply
the wrong sanitizer if they really want to. Remember, the purpose
is to help programmers by telling what data needs cleansing. It
is up to them to make the right decision. If we wanted to force
the use of the "right" sanitizer then we would need multiple
shades of untaintedness. This would not be practical.
Again, many functions have different behaviors etc... Let's take an
example htmlspecialchars() is great against XSS but does nothing for
exec(), so if you htmlspecialchars a string then pass it to exec, it
thinks that the data is non-tainted and executes it resulting in
command injection.
Overall, as it stands I do not believe that this is a good idea and
as is my vote would be -0.5 on its inclusion into PHP.
Ilia Alshanetsky
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php