I now provide an update of the discussion. The good, the bads, the open questions, etc.
All of this will go into the RFC also, as requested by the procedure in https://wiki.php.net/rfc/howto "Listen to the feedback, and try to answer/resolve all questions. Update your RFC to document all the issues and discussions. Cover both the positive and negative arguments. Put the RFC URL into all your replies." === UPDATES === - Different users have tested the functionality and obtained the promissed results. Also their feedback about it was positive. - Most part of the community in the mailing list showed a positive opinion about this RFC, and looks forward for its integration into PHP. - The ones that checked the code also agree that is small implementation, easy to mantain, and at the same time provides a big benefit for such small implementation. - The community got involve very actively in the discussion of the RFC and provided all kind of useful feedback, and also took the time to test json_validate() by themselves. === Bad reasons for json_validate() provided by the community === - One member of the mailing list expressed that: 1) Incorporating such a small implementation that can be achieve with userland code is not a good idea. Quote: "If we keep the tendency to pollute already bloated standard library with an army of small functions that could have not exists and be replaced with normal PHP counterparts IMHO we'll end with frustration from developers as I believe DX slowly falls down here." 2) json_validate() would only be useful for edge cases. Quote: "A `json_decode()` is a substitute that IMO solves 99% of use cases. If I'd follow your logic and accept every small addition that handles 1% of use cases, somebody will raise another RFC for simplexml_validate_string or yaml_validate and the next PhpToken::validate. All above can be valid if we trust that people normally validate 300MB payloads to do nothing if they DON'T fail and there is nothing strange about that." 3) The user also provided an implementation of a JSON parser written in pure PHP. https://gist.github.com/brzuchal/37e888d9b13937891c3e05fead5042bc === Good reasons for json_validate() provided by the community === @@@ Use cases provided by some members, I quote: - "Yes well-formed JSON from a trusted source tends to be small-ish. But a validation function also needs to deal with non-well-formed JSON, otherwise you would not need to validate it." - "If with a new function (json_validate()) it becomes much easier to defend against a Denial-of-Service attack for some parts of a JSON API, then this can be a good addition just for security reasons." - "fast / efficient validation of a common communication format reduces the attack surface for Denial-of-Service attacks." @@@ Memory usage - During the test of json_validate() from some users, they were happy about the memory usage that was zero in most cases (which is the main benefit out this feature). Someone also did a test with a very large string (75 MB) and only a few bytes were needed as reported by him; also the same user reported an execution speed improvement by a 20-25% over using json_decode(). @@@ Reasons not to depend on userland JSON parsers Even possible to write an excellent JSON parser in PHP like one of the members in the mailing list provided us, there are good reasons for dont relying on userland solutions. # 1 - User Tim Düsterhus provided nice thoughts about this, in favor to json_validate(), ... I quote him: - "Writing a JSON parser is non-trivial as evidenced by: https://github.com/nst/JSONTestSuite. I expect userland implementations to be subtly buggy in edge cases. The JSON parser in PHP 7.0+ is certainly more battle-tested and in fact it appears to pass all of the tests in the linked test suite." - "Even if the userland implementation is written very carefully, it might behave differently than the native implementation used by json_decode() (e.g. because the latter is buggy for some reason or because the correct behavior is undefined). This would imply that an input string that was successfully validated by your userland parser might ultimately fail to parse when passed to json_decode(). This is exactly what you don't want to happen." (Some other members including me, also share this opinion.) # 2 - The JSON parser in PHP follows an special convention, marked in the PHP documentation. # 3 - We already have a JSON parser in PHP, that is used by json_decode(); reusing the existing JSON Parser provides 100% compatibility between the validation of a json-string, and the decoding of it. # 4 - The user Larry Gafield also provided good reason to integrate this implementation into PHP. I quote him: "The heuristic I use is that an API should be "reasonably complete" in one location. Having a half-assed API in C and the rest left to inconsistent and redundant user-space implementations is a terrible API; the same would apply for a user-space library that is half-assed and leaves the rest to "someone else to write." Naturally "reasonably complete" is a somewhat squishy term, which is why it's a heuristic. By that metric, yes, str_starts_with() and friends absolutely belonged in core, because we already have a bunch of string functions and str_starts_with() is by a wide margin the most common usage of strpos(). By the same token, yes, json_validate() makes sense to include in the main API, which means in C. If there's a performance benefit to doing so as well, that makes it an easy sell for me." === Changes in the implementation === @@@ THROW EXCEPTION ON ERROR - The ability to throw an exception on error will be remove from the implementation, as this was pointed not only by the users in the mailing list, but also during code review. There are totally valid arguments to remove this capability. === Changes in the RFC === - I removed 3 of the provided examples because did not adjust to the RFC purpose. - I still need to add the provided use cases provided by the community into the RFC, where json_validate() will make a good impact. - Updating the RFC requires time, that is why the mailing list will be updated before the RFC itself. === Clarification about the name === - In the beginning I named the function "is_json()", but I was not following the convention written in: https://github.com/php/php-src/blob/master/CODING_STANDARDS.md#user-functionsmethods-naming-conventions That is why I adjusted the implementation with the name "json_validate()" suggested not only during code review, but also by some of the people in the internals-mailing list. === Open issues/concerns === @@@ Usage of JSON_INVALID_UTF8_IGNORE @@@ - I have my doubts now, because of this codes: ``` <php var_dump(json_validate("\"a\xb0b\""), json_last_error_msg()); var_dump("------------"); var_dump(json_validate("\"a\xb0b\"", 512, JSON_INVALID_UTF8_IGNORE), json_last_error_msg()); ``` Results: bool(false) string(56) "Malformed UTF-8 characters, possibly incorrectly encoded" string(12) "------------" bool(true) string(8) "No error" Gives different results, but ... ``` <?php var_dump(json_decode("{ \"a\xb0b\" : \"dummy\" }"), json_last_error_msg()); var_dump("------------"); var_dump(json_decode("{ \"a\xb0b\" : \"dummy\" }", 512, JSON_INVALID_UTF8_IGNORE), json_last_error_msg()); ``` Results in: NULL string(56) "Malformed UTF-8 characters, possibly incorrectly encoded" string(12) "------------" NULL string(56) "Malformed UTF-8 characters, possibly incorrectly encoded" So at a first look, seems the flag should also be remove, as with json_validate() we say its valid, but then on json_decode() we get NULL, even with the JSON_INVALID_UTF8_IGNORE provided; does not make sense I believe. What do you think? Is there a use case I am missing here? Is this flag worth to have still? @@@ Returning BOOLEAN or INT @@@ Option 1) Return boolean, and in case of error, rely on the functions json_last_error() and json_last_error_msg() to get the proper error. Option 2) Return integer number, following the already existing constants for json errors, being the return values as: JSON_ERROR_NONE No error has occurred JSON_ERROR_DEPTH The maximum stack depth has been exceeded JSON_ERROR_STATE_MISMATCH Invalid or malformed JSON JSON_ERROR_CTRL_CHAR Control character error, possibly incorrectly encoded JSON_ERROR_SYNTAX Syntax error JSON_ERROR_UTF8 Malformed UTF-8 characters, possibly incorrectly encoded JSON_ERROR_RECURSION One or more recursive references in the value to be encoded JSON_ERROR_INF_OR_NAN One or more NAN or INF values in the value to be encoded JSON_ERROR_UNSUPPORTED_TYPE A value of a type that cannot be encoded was given JSON_ERROR_INVALID_PROPERTY_NAME A property name that cannot be encoded was given JSON_ERROR_UTF16 Malformed UTF-16 characters, possibly incorrectly encoded - From implementation point of view, is very easy to achieve both, we just have to come to an agreement. I believe both approaches are valid. I also got this observation during code review. Personally I prefer BOOLEAN, because TRUE is more appeal to the understanding that the json-string is valid ... than returning zero, which in PHP is a falsy value by design. For me BOOLEAN will avoid confusions. RFC: https://wiki.php.net/rfc/json_validate Implementation: https://github.com/php/php-src/pull/9399 -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: https://www.php.net/unsub.php