From: Rasmus Lerdorf [mailto:ras...@lerdorf.com] 
> I guess he is saying that it prevents:
> 
>    Random bytes
>    <?php kill();?>
>    More random bytes
>
> Where random bytes might be an image file so finfo_file() might identify it 
> as a valid image

Right, but anyone can trivially construct a fully valid bitmap with a starting 
byte sequence of `42 4D 3B 2F 2A`, which resolves to `BM;/*`. PHP will decide 
that BM meant 'BM', effectively skipping it, then the open comment will slide 
the PHP interpreter past any remaining header stuff. You can close the comment 
and place the actual code payload anywhere in the image data. The early bytes 
in other image formats are similarly exploitable. As far as I can tell there is 
really no security win here.

> 4. Only protecting against mid-script injections and not top-of-script 
> injections is a somewhat subtle concept when the real problem is the 
> vulnerable include $_GET['filename'] hole. If this really is a prevalent 
> problem, maybe instead of trying to mitigate the symptoms, why don't we try 
> to attack the actual cause of the problem. I would love to hear some ideas 
> along those lines that don't fundamentally change the nature of PHP for 
> somewhat cloudy benefits.
> 
> -Rasmus

It's disturbingly common. Probably 90% of the automated attacks I see in the 
404 error logs are trying to exploit various inclusion vulnerabilities.

One idea that comes to mind immediately is the old taint RFC: 
https://wiki.php.net/rfc/taint. This doesn't actually prevent LFI, but it 
(optionally) warns the developer that they did something very bad, regardless 
of whether it actually caused a problem with the specific input data. I'd 
really love to see that one finalized and implemented.

Another wild alternative could be to have a non-trivial string format 
internally, where PHP strings are actually a set of distinct blocks which each 
contain encoding information. This would make it possible to concatenate 
strings just as always, but since the attributes of each block are known the 
entire string contents could be manipulated to an arbitrary final encoding, (or 
rejected as impossible to safely convert) when the string is actually used. In 
the include case this isn't really very different from taint, because safe 
conversion is impossible, but for things like XSS and SQL injection it could 
actually *fix* the otherwise vulnerable code. A simplified example of how this 
might work:

http://example.com?name=%3Cscript%3Exss()%3B%3C%2Fscript%3E

// $_GET['name'] === [text&user&utf8('<script>xss();</script>')];
$name = $_GET['name'];

$welcome = html("Welcome <b>$name</b>!"); // $welcome === [html('Welcome <b>'), 
text&user&utf8('<script>xss();</script>'), html('</b>!')];

echo $_GET['name']; // assuming the current output format is text/html, the 
output will be "Welcome <b> &lt;script&gt;xss();&lt;/script&gt;</b>!"

Obviously this second idea is probably a prohibitively large change, there is 
some BC break (especially where an input was known to be HTML but secured via 
something like HTMLPurifier), and there are huge open questions (like how to 
handle string comparison). Still, I think it is interesting because it actually 
divines the real meaning. The intent of the above code is obvious to a 
developer, and something like this could bring that understanding to the final 
result. This specific concept has issues, but maybe it gives someone else a 
more practical idea.

John Crenshaw
Priacta, Inc.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to