On Mon, May 30, 2016 at 7:18 PM, Stanislav Malyshev <smalys...@gmail.com> wrote:
>> In fact, the idea of stripping content from a script file isn't
>> without precedent.  Shebang lines are routinely removed from
>> cli/cgi/fpm, and if you want to properly output it, you need to do so
>
> True, because in the context of CLI we know what is expected - a CLI
> script which can start with #!. It is very unlikely that we'd have a
> template run directly as CLI script and we would have this template
> starting with #! which we want to output. But we lack such context in a
> generic script - namely, the context that would tell us if it's safe to
> drop the BOM.
>
That was the idea of the declare(), to provide that context, since it
can't be reliably inferred.

>> So can we apply the same to the BOM?  There's the obvious BC danger of
>> files which might depend on this behavior (declaring their encoding
>> via BOM, which happens to be the same as the script encoding).
>
> Given that BOM in script files is mostly useless, and BOM in UTF-8 is
> useless and not recommended for use either, I don't see why we need to.
>
> In general, I don't think BOM is a real issue worth messing with the
> lexer. Surely, from time to time somebody would use weird editor which
> produces BOMs, like editing PHP scripts in Word. Surely, they'd have
> weird effects that would force them to spend 5 minutes googling and
> fixing it. I don't think it is the reason to spend day-persons of our
> collective time to find a fix to this very niche problem and risk
> potential BC issues.
>
Agreed it's niche, and agreed that it's mostly the editor's fault for
putting the BOM in place to begin with.  Disagree on the value of the
time that would be needed to provide some sort of benefit.

I will say though, that you're almost certainly right that it's not a
significant problem (if it's one at all), and I'd want to hear from
people who encounter this on a regular basis for which there isn't a
much simpler fix available (such as disabling BOM emission in their
editor of choice).

> If it is really becoming an issue, we could probably make the lexer
> treat BOM+<? the same as <?, but I'm not convinced it is a serious
> enough issue.
>
That's probably a reasonable compromise on the context issue.  It
provides a clean escape hatch for intentional BOMs by echoing those
bytes from script, even if it is magic behavior which is generally to
be avoided.

> That presumes you know there's BOM in the beginning of your file. If so,
> why don't you just delete it instead of typing a long declare directive?
>
Dunno.  I just like to argue.

-Sara

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to