Re: [PHP-DEV] RFC json_validate() - status: Under Discussion

Tim Düsterhus Thu, 25 Aug 2022 13:22:12 -0700

Hi

On 8/25/22 21:11, David Gebler wrote:

There are many examples of userland code which could be faster and more
memory efficient if they were written in C and compiled in, so the mere
fact this proposal may introduce a somewhat faster way of validating a JSON
string over decoding it is not necessarily a sufficient reason to include
it.


Are there are examples of raising issues for frameworks or systems saying
they need to validate some JSON but the only existing solutions available
to them are causing memory limit errors, or taking too long? The Stack
Overflow question linked on the RFC says "I need a really, really fast
method of checking if a string is JSON or not."

The proposed function is meant to be used for validation. Validationprocesses by definition need to deal with untrusted data. So the inputdata might even be actively malicious in order to tie up resources onthe server (DoS attack - single D there).

In most real world use cases [that I've encountered over the years] JSON
blobs tend to be quite small. I have dealt with much, much larger JSON

Yes well-formed JSON from a trusted source tends to be small-ish. But avalidation function also needs to deal with non-well-formed JSON,otherwise you would not need to validate it.

I was able to use up an extra 100 MB of RAM with a 3 MB input that isinvalid JSON when using json_decode(), just for it to reject the input.

For json_validate() the extra memory (as per memory_get_peak_usage())required for the same operation effectively zero. It was able to dealwith 60 MB of input just fine.

I've attached the script I used for the test. I left out the actual JSONstring to not give script kiddies a loaded weapon, but you likely shouldbe able to craft some input yourself.

blobs, up to a few hundred MB, and in those cases I've used a streaming
parser. If you're talking about JSON that size, a streaming parser is the
only realistic answer - you probably don't want to drop a 300MB string in
to this RFC's new function either, if performance and memory efficiency is
your concern.

So I'm curious as to whether a real world example can be given where the
efficiency difference between json_decode and a new json_validate function
would be important to the system, whether anyone's encountered a scenario
where this would have made a real difference to them.

While my example is not a real world example, I don't believe it's astretch to say it can be applied as-is to the real world.


So IMO:

- The proposed function does exactly what it promises to do, not more,not less.- If it's introduced, then it is going to be the obvious choice for JSONvalidation and at the same time it is going to be the best choice forJSON validation. I strongly believe it is a good thing if users aresteered to make the correct choice by default without needing to investbrain cycles.- The patch is pretty small, because the hard work of JSON parsing isalready implemented.- Userland implementations are non-obvious and non-trivial as evidencedby the examples in the RFC: They are all slightly different and one ofthem even mishandles a plain `false` input, because it does not checkjson_last_error().- Userland implementations are also either less efficient (relying onjson_decode()) or potentially inconsistent (hand-rolling a validatingparser).


Best regards
Tim Düsterhus

<<attachment: json-test.php>>

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Re: [PHP-DEV] RFC json_validate() - status: Under Discussion

Reply via email to