On Thu, May 27, 2021 at 5:44 AM Nikita Popov <nikita....@gmail.com> wrote:

> https://github.com/php/php-src/pull/7059 does a surgical change to replace
> null bytes with \0.
>
>
Before delving into any other observations, I first want to state
emphatically that we should not ONLY escape chr(0).  We either apply full
on addcslashes() (or similar) or do nothing at all.  Half-escaped is worse
than not escaped.  Sadly, this presents more BC issues than just chr(0) or
'\\0' do alone since there are many more potential places where output will
change for anyone using it programmatically.

So should we?

On the one hand: The intent of the var_dump() function is for human
readable debugging.  Anyone using this in a programmatic way is Doing It
Wrong™, so I'm not too fussed about changing the output of this function.
On the other hand: WE use this function programmatically in our tests
across many thousands of occurrences.  Others probably rightly do as well.
Yes, the dissonance is real.

The most prudent and BC-safe thing would be to add another function
`var_dump_escaped()` or even to prefer/advise json_encode() when content
safety is relevant, but additional type information is not (most of the
uses of var_dump we have).  Unfortunately, this doesn't actually fix the
initial problem you stated, which is just getting useful data out of CI
failures.

The PR you offered is the lightest touch and fixes the issue you cited
without causing any likely damage to current users, but I don't think we
should ignore the ambiguity of the output.  Additionally, I think I could
make an easy argument in favor of escaping CR and NL at the very least.  At
this point the urge to escape backslash is extra-real as with windows paths
an innocent \n is far more likely.

------

Actually. Maybe we're thinking of this wrong.

Rather than change the output at all, why not just have a post-process step
in our test runner that transforms the output in the test report?  Then we
could be as aggressive as we want, going as far as escaping all
non-printables plus backslash since at that point we're in it for the human
readability (and knowing specific byte sequences is essential) and we break
zero BC for anyone else?

-Sara

Reply via email to