Hi On 8/25/22 21:11, David Gebler wrote:
There are many examples of userland code which could be faster and more memory efficient if they were written in C and compiled in, so the mere fact this proposal may introduce a somewhat faster way of validating a JSON string over decoding it is not necessarily a sufficient reason to include it.Are there are examples of raising issues for frameworks or systems saying they need to validate some JSON but the only existing solutions available to them are causing memory limit errors, or taking too long? The Stack Overflow question linked on the RFC says "I need a really, really fast method of checking if a string is JSON or not."
The proposed function is meant to be used for validation. Validation processes by definition need to deal with untrusted data. So the input data might even be actively malicious in order to tie up resources on the server (DoS attack - single D there).
In most real world use cases [that I've encountered over the years] JSON blobs tend to be quite small. I have dealt with much, much larger JSON
Yes well-formed JSON from a trusted source tends to be small-ish. But a validation function also needs to deal with non-well-formed JSON, otherwise you would not need to validate it.
I was able to use up an extra 100 MB of RAM with a 3 MB input that is invalid JSON when using json_decode(), just for it to reject the input.
For json_validate() the extra memory (as per memory_get_peak_usage()) required for the same operation effectively zero. It was able to deal with 60 MB of input just fine.
I've attached the script I used for the test. I left out the actual JSON string to not give script kiddies a loaded weapon, but you likely should be able to craft some input yourself.
blobs, up to a few hundred MB, and in those cases I've used a streaming parser. If you're talking about JSON that size, a streaming parser is the only realistic answer - you probably don't want to drop a 300MB string in to this RFC's new function either, if performance and memory efficiency is your concern. So I'm curious as to whether a real world example can be given where the efficiency difference between json_decode and a new json_validate function would be important to the system, whether anyone's encountered a scenario where this would have made a real difference to them.
While my example is not a real world example, I don't believe it's a stretch to say it can be applied as-is to the real world.
So IMO:- The proposed function does exactly what it promises to do, not more, not less. - If it's introduced, then it is going to be the obvious choice for JSON validation and at the same time it is going to be the best choice for JSON validation. I strongly believe it is a good thing if users are steered to make the correct choice by default without needing to invest brain cycles. - The patch is pretty small, because the hard work of JSON parsing is already implemented. - Userland implementations are non-obvious and non-trivial as evidenced by the examples in the RFC: They are all slightly different and one of them even mishandles a plain `false` input, because it does not check json_last_error(). - Userland implementations are also either less efficient (relying on json_decode()) or potentially inconsistent (hand-rolling a validating parser).
Best regards Tim Düsterhus
<<attachment: json-test.php>>
-- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php