Hi,

On Thu, Dec 31, 2020 at 1:30 AM Máté Kocsis <kocsismat...@gmail.com> wrote:

> Hi Remi and Jakub,
>
>>
> I agree it's too early as the library is young and won't be available in
>>> many distros. The PECL path is better in this case IMO as it will allow
>>> some time .
>>
>>
> In my opinion, this is a case where making an
> exception is worth considering.
>
>
The PECL path doesn't mean that the extension won't be used. From the user
point of view, it doesn't really matter that much as distro packages are
available even for PECL extensions and it can easily added to Docker image
as well. The advantage is that it will give some time for the library to be
available in distros and possibly more stable. I think it would actually
help with stabilizing the API and introducing new features quickly. It
means all the mentioned features could be provided to users in the
extension release cycle and not waiting a year for new PHP version.


> Should the simdjson library be written in C,
> I'd propose to add the new API + parser to
> ext/json directly, since ext/simdjson is just a
> very small wrapper around the parser, and
> not a complex piece of code in itself (compared to other parts of php-src).
>
>
I think this wouldn't be really an option even if it was in C because
ext/json is enabled by default so you couldn't have external dependency on
such a new library. Otherwise it would mean that you couldn't install PHP
if that library is not available. Of course unless the library is bundled
but that is not a good idea for the maintanance.


> Also, I think the performance benefit of using
> the simdjson parser is so major that it would
> be a pity if people had to wait for years until
> the functionality becomes generally available
> as a core extension. As json_encode() and
> json_decode() are very easy to use, my guess
> is that a 3rd party JSON-related extension
> would never get an adoption large enough,
> because only those people would install it
> who have really reached the limitations of ext/json.
>
>
But this proposal is not about changing json_decode. It introduces a new
API that can be in the same way introduce in the PECL extension. As I said
above, having that in PECL doesn't mean that it's not available and people
have to wait for it.

It would be great to actually see what the performance benefit is in the
real applications. The benchmarks relies on repeated calls which is not
always the way how it's in the application (e.g. due to processor cache).
Also it might not be such a huge perf increase for the most apps as the
actual parsing is not usually the app bottleneck. That said I think it will
bring some considerable improvements anyway in the apps especially in those
doing lots of parsing but would be great to see how much it is in reality.


> By the way, it has just come to my mind that
> our company is also affected by these
> limitations. Sometimes we have to parse
> very large JSON documents, and in some
> cases these can end up being truncated.
> Fortunately we only need a specific part of
> the data, so someone wrote a partial "parser"
> (this is euphemism) tailored for the schema
> in question. Rather than having to use
> custom hackery, it would be so much better
> if PHP would offer partial parsing out of the
> box, like what the proposed
> JsonParser::getKeyValue() does.
>

As you mentiened, this could be possible in ondemand API which looks really
useful indeed. There are more things that are pretty useful like JSON
Pointer, better error reporting and UTF8 validation that could be
potentially also re-used in encoder. I think it would be great to have at
least some of the features in the extension before it gets to the core.
Especially thinking about the error reporting which should no longer depend
on global state.

One note about the proposed API. As it's not part of the ext/json, it
shouldn't be called JsonParser but rather SimdJsonParser to reflect that
it's part of the simdjson extension. That's the convention that is used for
other exts and it's also less confusing for users because that class won't
be available for many users initially - at least until the library is
available or extension installed / enabled. The methods also shouldn't be
all static but rather instance should be provided that would allow getting
errors or using the ondemand mode.


> That said, the cost-benefit ratio of having
> simdjson in core seems advantageous for me.
>
> Was thinking that it would be good to consider some kind of plugable
>> decoder where another extension could register a parsing callback.
>> Something similar to what we have for parser but instead for the whole
>> decoding. That would allow to still use current parser in json_decode but
>> if simdjson available / configured in ini, then it would used instead and
>> would be just faster. Not sure if all options are supported though - for
>> example don't see any note about UTF8 substitution
>> (JSON_INVALID_UTF8_SUBSTITUTE).
>>
>
> This is a very interesting approach, and it reminds me about the hashing
> registry RFC
> in certain extent.
>
> And you are right, as far as I saw, these flags
> are not supported by ext/simdjson. But to be
> honest, I haven't analyzed yet how difficult it
> would be to have a reasonably full compatibility with ext/json.
>
>
Yeah it would be really cool to be able to natively speed json_decode to
get instant improvement for the apps as the new API usually takes time to
get used. This pluggable API is in my eyes the way how to do it because we
might still need to have the current parser available for the cases where
simdjson lib is not available. But as you say it requires some analysis
first.

I think for now it would be great to create a repo for that extension under
php org in Github, get it to PECL and then add more features into it. If I
manage to find some time, I would be happy to contribute some bits or at
least help with reviews. Once it gets a bit more stable, we could consider
it for inclusion to the core.

Cheers

Jakub

Reply via email to