On Thu, May 27, 2021 at 5:44 AM Nikita Popov <nikita....@gmail.com> wrote:
> https://github.com/php/php-src/pull/7059 does a surgical change to replace > null bytes with \0. > > Before delving into any other observations, I first want to state emphatically that we should not ONLY escape chr(0). We either apply full on addcslashes() (or similar) or do nothing at all. Half-escaped is worse than not escaped. Sadly, this presents more BC issues than just chr(0) or '\\0' do alone since there are many more potential places where output will change for anyone using it programmatically. So should we? On the one hand: The intent of the var_dump() function is for human readable debugging. Anyone using this in a programmatic way is Doing It Wrong™, so I'm not too fussed about changing the output of this function. On the other hand: WE use this function programmatically in our tests across many thousands of occurrences. Others probably rightly do as well. Yes, the dissonance is real. The most prudent and BC-safe thing would be to add another function `var_dump_escaped()` or even to prefer/advise json_encode() when content safety is relevant, but additional type information is not (most of the uses of var_dump we have). Unfortunately, this doesn't actually fix the initial problem you stated, which is just getting useful data out of CI failures. The PR you offered is the lightest touch and fixes the issue you cited without causing any likely damage to current users, but I don't think we should ignore the ambiguity of the output. Additionally, I think I could make an easy argument in favor of escaping CR and NL at the very least. At this point the urge to escape backslash is extra-real as with windows paths an innocent \n is far more likely. ------ Actually. Maybe we're thinking of this wrong. Rather than change the output at all, why not just have a post-process step in our test runner that transforms the output in the test report? Then we could be as aggressive as we want, going as far as escaping all non-printables plus backslash since at that point we're in it for the human readability (and knowing specific byte sequences is essential) and we break zero BC for anyone else? -Sara