I now provide an update of the discussion.

The good, the bads, the open questions, etc.

All of this will go into the RFC also, as requested by the procedure
in https://wiki.php.net/rfc/howto

"Listen to the feedback, and try to answer/resolve all questions.
Update your RFC to document all the issues and discussions. Cover both
the positive and negative arguments. Put the RFC URL into all your
replies."

=== UPDATES ===

- Different users have tested the functionality and obtained the
promissed results. Also their feedback about it was positive.

- Most part of the community in the mailing list showed a positive
opinion about this RFC, and looks forward for its integration into
PHP.

- The ones that checked the code also agree that is small
implementation, easy to mantain, and at the same time provides a big
benefit for such small implementation.

- The community got involve very actively in the discussion of the RFC
and provided all kind of useful feedback, and also took the time to
test json_validate() by themselves.


=== Bad reasons for json_validate() provided by the community ===


- One member of the mailing list expressed that:

1) Incorporating such a small implementation that can be achieve with
userland code is not a good idea.

Quote: "If we keep the tendency to pollute already bloated standard
library with an army of small functions that could have not exists and
be replaced with normal PHP counterparts IMHO we'll end with
frustration from developers as I believe DX slowly falls down here."

2) json_validate() would only be useful for edge cases.

Quote: "A `json_decode()` is a substitute that IMO solves 99% of use cases.
If I'd follow your logic and accept every small addition that handles
1% of use cases, somebody will raise another RFC
for simplexml_validate_string or yaml_validate and the next PhpToken::validate.
All above can be valid if we trust that people normally validate 300MB
payloads to do nothing if they DON'T fail and there is nothing strange
about that."

3) The user also provided an implementation of a JSON parser written
in pure PHP. https://gist.github.com/brzuchal/37e888d9b13937891c3e05fead5042bc


=== Good reasons for json_validate() provided by the community ===


@@@ Use cases provided by some members, I quote:

- "Yes well-formed JSON from a trusted source tends to be small-ish.
But a validation function also needs to deal with non-well-formed
JSON, otherwise you would not need to validate it."

- "If with a new function (json_validate()) it becomes much easier to
defend against a Denial-of-Service attack for some parts of a JSON
API, then this can be a good addition just for security reasons."

- "fast / efficient validation of a common communication format
reduces the attack surface for Denial-of-Service attacks."

@@@ Memory usage

- During the test of json_validate() from some users, they were happy
about the memory usage that was zero in most cases (which is the main
benefit out this feature).
Someone also did a test with a very large string (75 MB) and only a
few bytes were needed as reported by him; also the same user reported
an execution speed improvement by a 20-25% over using json_decode().

@@@ Reasons not to depend on userland JSON parsers

Even possible to write an excellent JSON parser in PHP like one of the
members in the mailing list provided us, there are good reasons for
dont relying on userland solutions.

# 1 - User Tim Düsterhus provided nice thoughts about this, in favor
to json_validate(), ... I quote him:

- "Writing a JSON parser is non-trivial as evidenced by:
https://github.com/nst/JSONTestSuite. I expect userland
implementations to be subtly buggy in edge cases. The JSON parser in
PHP 7.0+ is
certainly more battle-tested and in fact it appears to pass all of the
tests in the linked test suite."

- "Even if the userland implementation is written very carefully, it
might behave differently than the native implementation used by
json_decode() (e.g. because the latter is buggy for some reason or
because the correct behavior is undefined). This would imply that an
input string that was successfully validated by your userland parser
might ultimately fail to parse when passed to json_decode(). This is
exactly what you don't want to happen."


(Some other members including me, also share this opinion.)


# 2 - The JSON parser in PHP follows an special convention, marked in
the PHP documentation.

# 3 - We already have a JSON parser in PHP, that is used by
json_decode(); reusing the existing JSON Parser provides 100%
compatibility between the validation of a json-string, and the
decoding of it.

# 4 - The user Larry Gafield also provided good reason to integrate
this implementation into PHP. I quote him:

"The heuristic I use is that an API should be "reasonably complete" in
one location.  Having a half-assed API in C and the rest left to
inconsistent and redundant user-space implementations is a terrible
API; the same would apply for a user-space library that is half-assed
and leaves the rest to "someone else to write."  Naturally "reasonably
complete" is a somewhat squishy term, which is why it's a heuristic.

By that metric, yes, str_starts_with() and friends absolutely belonged
in core, because we already have a bunch of string functions and
str_starts_with() is by a wide margin the most common usage of
strpos().

By the same token, yes, json_validate() makes sense to include in the
main API, which means in C.  If there's a performance benefit to doing
so as well, that makes it an easy sell for me."



=== Changes in the implementation ===

@@@ THROW EXCEPTION ON ERROR

- The ability to throw an exception on error will be remove from the
implementation, as this was pointed not only by the users in the
mailing list, but also during code review. There are totally valid
arguments to remove this capability.



=== Changes in the RFC ===


- I removed 3 of the provided examples because did not adjust to the
RFC purpose.

- I still need to add the provided use cases provided by the community
into the RFC, where json_validate() will make a good impact.

- Updating the RFC requires time, that is why the mailing list will be
updated before the RFC itself.


=== Clarification about the name ===


- In the beginning I named the function "is_json()", but I was not
following the convention written in:

https://github.com/php/php-src/blob/master/CODING_STANDARDS.md#user-functionsmethods-naming-conventions

That is why I adjusted the implementation with the name
"json_validate()" suggested not only during code review, but also by
some of the people in the internals-mailing list.


=== Open issues/concerns ===


@@@ Usage of JSON_INVALID_UTF8_IGNORE @@@

- I have my doubts now, because of this codes:

```
<php

var_dump(json_validate("\"a\xb0b\""), json_last_error_msg());
var_dump("------------");
var_dump(json_validate("\"a\xb0b\"", 512, JSON_INVALID_UTF8_IGNORE),
json_last_error_msg());
```

Results:

bool(false)
string(56) "Malformed UTF-8 characters, possibly incorrectly encoded"
string(12) "------------"
bool(true)
string(8) "No error"


Gives different results, but ...

```
<?php

var_dump(json_decode("{ \"a\xb0b\" : \"dummy\" }"), json_last_error_msg());
var_dump("------------");
var_dump(json_decode("{ \"a\xb0b\" : \"dummy\" }", 512,
JSON_INVALID_UTF8_IGNORE), json_last_error_msg());
```

Results in:
NULL
string(56) "Malformed UTF-8 characters, possibly incorrectly encoded"
string(12) "------------"
NULL
string(56) "Malformed UTF-8 characters, possibly incorrectly encoded"



So at a first look, seems the flag should also be remove, as with
json_validate() we say its valid, but then on json_decode() we get
NULL, even with the JSON_INVALID_UTF8_IGNORE provided; does not make
sense I believe.

What do you think? Is there a use case I am missing here? Is this flag
worth to have still?


@@@ Returning BOOLEAN or INT @@@

Option 1) Return boolean, and in case of error, rely on the functions
json_last_error() and json_last_error_msg() to get the proper error.

Option 2) Return integer number, following the already existing
constants for json errors, being the return values as:

JSON_ERROR_NONE
  No error has occurred

JSON_ERROR_DEPTH
  The maximum stack depth has been exceeded

JSON_ERROR_STATE_MISMATCH
  Invalid or malformed JSON

JSON_ERROR_CTRL_CHAR
  Control character error, possibly incorrectly encoded

JSON_ERROR_SYNTAX
  Syntax error

JSON_ERROR_UTF8
  Malformed UTF-8 characters, possibly incorrectly encoded

JSON_ERROR_RECURSION
  One or more recursive references in the value to be encoded

JSON_ERROR_INF_OR_NAN
  One or more NAN or INF values in the value to be encoded

JSON_ERROR_UNSUPPORTED_TYPE
  A value of a type that cannot be encoded was given

JSON_ERROR_INVALID_PROPERTY_NAME
  A property name that cannot be encoded was given

JSON_ERROR_UTF16
  Malformed UTF-16 characters, possibly incorrectly encoded


- From implementation point of view, is very easy to achieve both, we
just have to come to an agreement. I believe both approaches are
valid. I also got this observation during code review. Personally I
prefer BOOLEAN, because TRUE is more appeal to the understanding that
the json-string is valid ... than returning zero, which in PHP is a
falsy value by design. For me BOOLEAN will avoid confusions.


RFC: https://wiki.php.net/rfc/json_validate
Implementation: https://github.com/php/php-src/pull/9399

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php

Reply via email to