On 31 Jul 2015, at 17:40, Ronald Chmara <rona...@gmail.com> wrote: > The RFC and bug report both make an erroneous assumption that unescaped GPC > input is wrong.
Hi Ronabop, Slightly continuing our discussion, but also replying to your on-list message... Starting with your examples: 1) Web applications that send variables directly to the database (e.g. phpMyAdmin), if I remember their codebase correctly, there is about 3 places where that happens, and after doing an authorisation and CSRF check, they could put in the 1 line to say that this value is already SQL escaped. 2) Overhead of escaping, where you later explained that this was more for legacy systems such as FoxBase and legacy ODBC data sources, where they pass the values over the network to have it escaped (slow)... and while that is a valid concern, hopefully all escaping is done on the client side now (if not, my previously non-parameterised queries would have taken far too long to load)... but that said, mysql_real_escape_string() isn't good enough either (it does not apply quote marks, as Matts example shows), hence why my examples talk about pg_escape_literal(). 3) Generic user written escaping libraries/functions to support multiple destinations... if they are still using the native escaping functions, then they don't need to do anything different... if the are using something like addslashes (maybe this is acceptable for something), then that library/function should mark it as having been escaped. 4) From a pre-cleaned source, e.g. taking HTML from a WYSIWYG editor, passing it though HTMLTidy, storing in the database... likewise, this just needs a single function call to say that this value is already HTML encoded - this is the "bio_html" example shown in bug 69886 :-) 5) Where a developer "is doing a file read off of a local hard drive and assuming their file contents will never have an SQL injection"... well, I'd like all string variables (e.g. from GPC, a file, database, xml, etc) to start with the assumption that it is unescaped... the developer could mark that as having been escaped, but more likely, just escape it (or use it with a hard coded, parameterised query, in the case of SQL). -- But I completely agree that raising too many notices will just mean that this feature would be switched off. I've been focusing too much of my own systems, and the assumption that most systems would/should be using parameterised queries, html encoding, etc (so they shouldn't get any notices, unless they have made a couple of mistakes). So the suggestion of using a new ini setting is a good idea (from Matts or the 2008 RFC's)... or as you suggested, perhaps have it on a per file basis, that could also work, like the declare(strict_types=1). That way we have a useful security feature that can be switched on (rather than everyone investing in an expensive static code analysis system, which uses dark magic to find "every problem")... then hopefully early adopters would start using it, it slowly becomes good practice, and sometime in the long distant future, it can be enabled by default (while in the mean time we try to educate as many developers as we can though channels like Stack Overflow). --- As to the "rarely" comment, fair point, I don't have numbers to back this up. What I'm trying to say is that, for the typical use case for websites (i.e. not things like phpMyAdmin), when you are using values in SQL, it is for them to be used as variables in the query... e.g. where an id equals this value, or insert a record with these values, or update this record with this value... i.e. how values in ORMs should be used. Likewise, when you have variables that need to be printed in HTML, the "safe" normal would be to have the PHP (or other static code) provide the HTML, and the variables should be html encoded (e.g. if you were printing someones name)... there is an exception to this was explained above (example 4). I realise that I'm trying to word it differently, but what I'm trying to say is that all values should be assumed bad, and it is up to the developer to either escape them, use parameterised queries, etc... or call a single function to say "yes, I know this one is fine"... then PHP can identify anything that has been missed. Craig On 31 Jul 2015, at 17:40, Ronald Chmara <rona...@gmail.com> wrote: > On Thu, Jul 30, 2015 at 8:38 AM, Craig Francis <cr...@craigfrancis.co.uk> > wrote: > On 30 Jul 2015, at 16:26, Ronald Chmara <rona...@gmail.com> wrote: > > Perhaps I have missed something in this discussion > I think you have... my email from a couple of weeks ago was ignored... so I > replied to Matt's suggestion (which is similar, but different). > Please, just spend a few minutes reading my suggestion, it has absolutely > nothing todo with breaking applications: > http://news.php.net/php.internals/87207 > https://bugs.php.net/bug.php?id=69886 > > The RFC and bug report both make an erroneous assumption that unescaped GPC > input is wrong. > > I was addressing some cases where it is the correct, intended, behavior. > > WRT "breaking": > Application stacks which go from zero E_NOTICE warnings, to hundreds or > thousands of them a second or day, is (admittedly) poorly phrased as > "breaking". It is a possible side effect of the proposed solutiions. I have > worked in very stingent environments where an unapproved E_NOTICE is > considered a ship-stop break, but I did not explicitly explain that. Such > environments would require re-writes of all SQL that throws an E_NOTICE, or a > per-line exception review and approval process, or disabling/not enabling the > checking. > > And yes, I do have a bypass_the_nerfing function (well, a function to say the > variable has already been escaped)... but the idea is that it's ever so > slightly harder to use than the related escaping functions, and rarely needed. > > "Rarely" is subjective at this point, a quanyifyable sampling of some kind > could be more meaningful. (How rare?) > > Based on *my* purely anecdotal experience, in the last X years of using PHP I > have have frequently encountered situations where passing through > engine-unescaped text strings, to SQL, is desired for some reason, in nearly > every environment. I mentioned one use case that I thought might be relevant > to a large number of users, there are others. > > Off the top of my head, some use cases I have dealt with relevant to this > discussion, that should be considered (even if they're discarded as trivial): > 1. Administrator has a web application that is supposed to allow them access > functionally equivalent to a direct connection to a database. > 2. Overhead of using the engine escaping technique (setup connection(if not > done yet), send text to escape at network speed, recieve escaped text at > network speed) was considered too much of an additional performance hit. > 3. Text needed to have a generic, user written, escaping library/function to > handle multiple destinations (think 5 different data storage systems, all > with different escaping rules, some without connection based escaping). > 4. Text being supplied was coming from a pre-cleaned, trusted, source, even > though the variable was GPC, (example: it was a GET string assembled by a > batch job that was pulling from another database) > > I'm sure there are many more. > > Basing language decisions on personal perceptions of "rarely" and > "frequently" is not a good practice, but ensuring that we are covering > multiple use-cases is. Discussing the use cases doesn't mean I think the idea > is without merit. > > -Ronabop