On 31 Jul 2015, at 17:40, Ronald Chmara <rona...@gmail.com> wrote:
> The RFC and bug report both make an erroneous assumption that unescaped GPC 
> input is wrong.




Hi Ronabop,

Slightly continuing our discussion, but also replying to your on-list message...


Starting with your examples:

1) Web applications that send variables directly to the database (e.g. 
phpMyAdmin), if I remember their codebase correctly, there is about 3 places 
where that happens, and after doing an authorisation and CSRF check, they could 
put in the 1 line to say that this value is already SQL escaped.

2) Overhead of escaping, where you later explained that this was more for 
legacy systems such as FoxBase and legacy ODBC  data sources, where they pass 
the values over the network to have it escaped (slow)... and while that is a 
valid concern, hopefully all escaping is done on the client side now (if not, 
my previously non-parameterised queries would have taken far too long to 
load)... but that said, mysql_real_escape_string() isn't good enough either (it 
does not apply quote marks, as Matts example shows), hence why my examples talk 
about pg_escape_literal().

3) Generic user written escaping libraries/functions to support multiple 
destinations... if they are still using the native escaping functions, then 
they don't need to do anything different... if the are using something like 
addslashes (maybe this is acceptable for something), then that library/function 
should mark it as having been escaped.

4) From a pre-cleaned source, e.g. taking HTML from a WYSIWYG editor, passing 
it though HTMLTidy, storing in the database... likewise, this just needs a 
single function call to say that this value is already HTML encoded - this is 
the "bio_html" example shown in bug 69886 :-)

5) Where a developer "is doing a file read off of a local hard drive and 
assuming their file contents will never have an SQL injection"... well, I'd 
like all string variables (e.g. from GPC, a file, database, xml, etc) to start 
with the assumption that it is unescaped... the developer could mark that as 
having been escaped, but more likely, just escape it (or use it with a hard 
coded, parameterised query, in the case of SQL).

--

But I completely agree that raising too many notices will just mean that this 
feature would be switched off.

I've been focusing too much of my own systems, and the assumption that most 
systems would/should be using parameterised queries, html encoding, etc (so 
they shouldn't get any notices, unless they have made a couple of mistakes).

So the suggestion of using a new ini setting is a good idea (from Matts or the 
2008 RFC's)... or as you suggested, perhaps have it on a per file basis, that 
could also work, like the declare(strict_types=1).

That way we have a useful security feature that can be switched on (rather than 
everyone investing in an expensive static code analysis system, which uses dark 
magic to find "every problem")... then hopefully early adopters would start 
using it, it slowly becomes good practice, and sometime in the long distant 
future, it can be enabled by default (while in the mean time we try to educate 
as many developers as we can though channels like Stack Overflow).

---

As to the "rarely" comment, fair point, I don't have numbers to back this up.

What I'm trying to say is that, for the typical use case for websites (i.e. not 
things like phpMyAdmin), when you are using values in SQL, it is for them to be 
used as variables in the query... e.g. where an id equals this value, or insert 
a record with these values, or update this record with this value... i.e. how 
values in ORMs should be used.

Likewise, when you have variables that need to be printed in HTML, the "safe" 
normal would be to have the PHP (or other static code) provide the HTML, and 
the variables should be html encoded (e.g. if you were printing someones 
name)... there is an exception to this was explained above (example 4).

I realise that I'm trying to word it differently, but what I'm trying to say is 
that all values should be assumed bad, and it is up to the developer to either 
escape them, use parameterised queries, etc... or call a single function to say 
"yes, I know this one is fine"... then PHP can identify anything that has been 
missed.

Craig







On 31 Jul 2015, at 17:40, Ronald Chmara <rona...@gmail.com> wrote:

> On Thu, Jul 30, 2015 at 8:38 AM, Craig Francis <cr...@craigfrancis.co.uk> 
> wrote:
> On 30 Jul 2015, at 16:26, Ronald Chmara <rona...@gmail.com> wrote:
> > Perhaps I have missed something in this discussion
> I think you have... my email from a couple of weeks ago was ignored... so I 
> replied to Matt's suggestion (which is similar, but different).
> Please, just spend a few minutes reading my suggestion, it has absolutely 
> nothing todo with breaking applications:
>    http://news.php.net/php.internals/87207
>    https://bugs.php.net/bug.php?id=69886
> 
> The RFC and bug report both make an erroneous assumption that unescaped GPC 
> input is wrong.
> 
> I was addressing some cases where it is the correct, intended, behavior. 
> 
> WRT "breaking":
> Application stacks which go from zero E_NOTICE warnings, to hundreds or 
> thousands of them a second or day, is (admittedly) poorly phrased as 
> "breaking". It is a possible side effect of the proposed solutiions. I have 
> worked in very stingent environments where an unapproved E_NOTICE is 
> considered a ship-stop break, but I did not explicitly explain that. Such 
> environments would require re-writes of all SQL that throws an E_NOTICE, or a 
> per-line exception review and approval process, or disabling/not enabling the 
> checking.
>  
> And yes, I do have a bypass_the_nerfing function (well, a function to say the 
> variable has already been escaped)... but the idea is that it's ever so 
> slightly harder to use than the related escaping functions, and rarely needed.
> 
> "Rarely" is subjective at this point, a quanyifyable sampling of some kind 
> could be more meaningful. (How rare?)
> 
> Based on *my* purely anecdotal experience, in the last X years of using PHP I 
> have have frequently encountered situations where passing through 
> engine-unescaped text strings, to SQL, is desired for some reason, in nearly 
> every environment. I mentioned one use case that I thought might be relevant 
> to a large number of users, there are others.
> 
> Off the top of my head, some use cases I have dealt with relevant to this 
> discussion, that should be considered (even if they're discarded as trivial):
> 1. Administrator has a web application that is supposed to allow them access 
> functionally equivalent to a direct connection to a database.
> 2. Overhead of using the engine escaping technique (setup connection(if not 
> done yet), send text to escape at network speed, recieve escaped text at 
> network speed) was considered too much of an additional performance hit.
> 3. Text needed to have a generic, user written, escaping library/function to 
> handle multiple destinations (think 5 different data storage systems, all 
> with different escaping rules, some without connection based escaping).
> 4. Text being supplied was coming from a pre-cleaned, trusted, source, even 
> though the variable was GPC, (example: it was a GET string assembled by a 
> batch job that was pulling from another database)
> 
> I'm sure there are many more.
> 
> Basing language decisions on personal perceptions of "rarely" and 
> "frequently" is not a good practice, but ensuring that we are covering 
> multiple use-cases is. Discussing the use cases doesn't mean I think the idea 
> is without merit. 
> 
> -Ronabop

Reply via email to