Re: [PHP-DEV] [RFC] is_literal

Mike Schinkel Sun, 13 Jun 2021 17:41:22 -0700

> On Jun 13, 2021, at 12:51 PM, Craig Francis <cr...@craigfrancis.co.uk> wrote:
> 
> On Sun, 13 Jun 2021 at 07:46, Mike Schinkel <m...@newclarity.net> wrote:
>  
> 1. [...] Minimally I'd ask for a separate vote on is_literal() vs. 
> is_from_literal().
> 
> It would definitely be possible to have a vote on the name.


Cool.

> 2. Regarding the sections "Non-Parameterised Values" and "Future Scope," if I 
> understand correctly...
> 
> ...you're right, we do need to support PhpMyAdmin, Adminer, systems like 
> Drupal (...)

That is good to know.

> Native functions like mysqli_query() should be covered in a future RFC, but 
> still only use warnings. The exact details would need to be confirmed, but 
> should provide a simple way to mark certain non-literal strings as trusted, 
> and still have a way to switch off those warnings entirely (e.g. legacy 
> systems).

Having code that is valid for a given application throw warnings is a huge 
non-starter to me. 

During development I believe is is a best practice to develop with display of 
errors *and* warnings to eliminate all warnings and all errors before moving to 
testing, staging and production. If warnings are not eliminated it is to easy 
for a developer to miss warnings that are important vs. ones that are "OK to 
ignore."

To illustrate, I just added a single trigger_error() to the project I am 
currently working on.  Take a look at what it does to my project:

https://www.evernote.com/l/AAV2_-KSz2RA9pcwtWo8AQJseoZYHtnD-aw

It would be impossible to program with any library that throws a warning and 
also develop with a "zero warnings" policy.

So again, having legitimate code display warnings is a full-stop "no" in my 
book.

More on this below.

> Ideally as future scope, we would also provide a way for certain projects to 
> opt-in to having exceptions raised (the militant option), which would not be 
> appropriate for the vast majority of developers, but it's the level I'd find 
> useful for some of my projects (dealing with medical records and other 
> sensitive information).
 
Yes, when you are in full control of your source code, being able to use 
is_literal() liberally would be great. 

BUT.  There needs to be a way to mark certain non-literal strings as trusted 
for when using Other People's Code(tm), and as I will argue in the rest of this 
reply, that way needs to be part of the RFC that also introduces is_literal().  
Doing otherwise is like manufacturing a poison but not also manufacturing its 
antidote.

> As to how we could (in a future RFC) mark certain non-literal strings as 
> trusted, I've briefly touched on how this can work in the "Future Scope" 
> section (it's an idea I copied from Trusted Types in JavaScript, which 
> follows the same principle). I only did a summary because the exact 
> implementation would need to be discussed later (there are a few options, but 
> they all rely on is_literal() being implemented first).
>  
> First, we need libraries to use is_literal() to check inputs from developers 
> (i.e. the code they didn't write), which is their main source of injection 
> vulnerabilities. 

And that is where the first problem lies.  Assuming libraries do start using 
is_literal() and there is not yet an existing way to mark a string as trusted, 
then at best-case the developer using that library will have to jump through 
hoops when they need a non-literal use-case, and worse-case this will mean the 
developer will be unable to use said library for their use-case. 

And it might be the only library available, or only one "approved" by their 
company (which would be worse than the above worst case.)

Introducing is_literal() without make_literal() is like manufacturing a gun 
without a safety catch.  

I assume that implementing make_literal() would not be a technical challenge, 
or would it?

Would there be any reason *not* to include make_literal() as part of your RFC?

> What goes on inside the library is entirely up to them (they know that they 
> are doing). For their output, they create a stringable value-object, which 
> (as future scope) could be marked as trusted for certain functions (like 
> mysqli_query).
> 
> These value-objects could be as simple as a single private property (value), 
> a __construct() method that takes the libraries output, and a __toString() 
> method to return it; or the library might want to be extra sure of its 
> output, and use something like the Query Builder example linked in the 
> "Non-Literal Values" section.

Having library developers create a stringable value-object is a nice idea, but 
you are assuming all library vendors will be enlightened enough to do so. 

On one end you will have library developers who are not sufficiently educated 
on this topic (and I will remind you of the RFC's comment regarding education) 
and on the other end you will have library developers who are dogmatic in their 
belief that requiring actual literals is the only one true way to program (and 
here I will remind you of the RFC's comment regarding static analysis.)  

In either extreme you will have the library vendor who has taken control away 
from the developer to use the library and still be able to implement their 
use-case.

And if you think I am being unrealistic I point you to the developers of the 
60k plugins on WordPress.org.  You have lots of less than fully educated 
developers publishing plugins that are used by unsuspecting users there, and a 
handful of super dogmatic developers too.

> As future scope, this allows functions like mysqli_query() to simply check 
> for a literal, or one of these trusted value-objects (or have warnings 
> switched off entirely).

How might mysqli_query() check for a "trusted value-object" and who could 
declare a class of that nature? 

Elaborating on ways this could be done really needs to be added to the RFC 
otherwise how are voters to know what is likely and/or even possible in the 
future?

> 3. I notice your RFC does not grant sprintf()...
> 
> <snip>
> 
> It's like the "String Splitting" part in the RFC, it's probably fine, and we 
> can do it if there's interest; but we also need to consider what issues it 
> might cause (if any). From a security point of view, it's always best to keep 
> something as simple as possible (and that may well include supporting 
> sprintf), because "You cannot prove security. You can only prove insecurity".

If the RFC provides make_literal() then having sprintf() be able to return a 
literal flag would just become a nice-to-have instead of a must-have.

> 4. Regarding the section "WHERE IN" you argue that we should push everyone to 
> use parameters properly — which is a great idea in theory — but that would 
> have a significant breakage in existing sites online.
> 
> Hopefully my answer to #2 addresses this, as you're right, that would 
> definitely cause more problems than it solves. Certainly for this 
> implementation I support libraries providing warnings and think those would 
> be the most appropriate, rather than exceptions. (Though obviously libraries 
> know their user bases best of course).

A big *NO* on warnings. Full stop.

An alternate could be (something like):

declare(relax_literal_checks=1).  

While I am not a big fan of adding declare() directives, this seems like a good 
use because it would only be needed for special cases, and a developer could 
structure their PHP files such that the files needing it could be limited to 
just the code that needs to relax literal checking.

BTW, I am assuming that setting this one would have the affect of adding the 
literal flag to all strings used in the file.

> In summary of my future scope answer above, libraries could be able to mark 
> their output as trusted for functions like mysqli_query(). For SQL it would 
> ideally still be a literal string, but that's not always possible (especially 
> with HTML), so that's where the trusted stringable value-objects would be 
> used. 

If the library vendor must develop the trusted value object then you are 
depending on an unreliable actor. 

Further, if every library vendor gets to define their own value object it could 
make life very confusing for application developers who have to learn which 
different value objects for every different library will solve the "Sorry, you 
can't use a non literal here" warning or exception.  

I can already see the avalanche of StackOverflow questions that requires lots 
of questions for the questioner, and the answer is always "It depends..."

So ideally there would either be "trusted value-objects" provided by PHP core, 
or they would be able to be easily identified by some consistent trait, such as 
a "as_literal()" method, or similar.

> 5. Given #2 and #4, I think your section "Support Functions" is missing [...] 
> as_literal() or make_literal() or even as_unsafe_literal()
> 
> I'm not against it, but I don't think it's needed (and with it raising 
> warnings not exceptions it won’t be necessary to prevent things breaking).

Count me as one who thinks that it is absolutely, unequivocally essential 
because warnings are not a valid solution.

See the result of this simple trigger_error() again?  
https://www.evernote.com/l/AAV2_-KSz2RA9pcwtWo8AQJseoZYHtnD-aw

> In the Example Library I made under the "Try It" section in the RFC:
> 
> https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4
> 
> It checks for literals but also accepts a value-object allowing you to bypass 
> that check (which I think is probably the best way to implement this at the 
> moment). It shows how an `unsafe_value` value-object could be used and 
> accepted by the example library, in the same way DB::raw() works in Laravel. 
> In the future a similar setup in PHP’s native functions could accept literals 
> or value-objects which can be trusted, so we shouldn’t need a way to lie to 
> PHP and tell it to treat one as the other.

That is a great example showing how a qualified developer would attack the 
requirement.

It is also significantly more elaborate than many developers working on 
libraries will go to the trouble to implement.  

And I don't mean that most popular libraries, I mean the long tail of 
libraries, including internal-use libraries developed within an organization 
that have to be used by others in that organization where internal politics do 
not allow the developer in need to modify the library written by their 
developer in control of the library. 

With great power comes great responsibility, but we both know that in the real 
world power corrupts.  In my experience, anyway.

> 
> 6. Regarding the section "Reflection API," shouldn't their be an 
> is_literal()/is_from_literal() method added to the ReflectionParameter() and 
> ReflectionProperty() classes?
> 
> <snip>
> The ReflectionParameter/ReflectionProperty allows us to return their 
> respective types (ReflectionType). In this case, it's saying these two are 
> "string|int", which isn't the current value, just what they can accept.
> 
> I think this is touching on an area that Joe mentioned yesterday, "a possible 
> future where we have first class support" in a more intricate way. I think 
> there is value in this, as it would allow PHP itself to reject non-literal 
> values, but that should be a separate RFC, as that's a bit more of a language 
> change, and will need its own discussion.

Lack of reflection support is not a deal-killer in my eyes, just a missing 
aspect to make the feature fully fleshed out.  But I think it could come later 
without causing major issues for developers in the mean time.

-Mike

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] [RFC] is_literal

Reply via email to