On Mon, Apr 8, 2024 at 12:23 PM Rowan Tommins [IMSoP] <imsop....@rwec.co.uk>
wrote:

>
> As I mentioned in the discussion about a "scalar arbitrary precision
> type", the idea of a scalar in this meaning is a non-trivial challenge, as
> the zval can only store a value that is treated in this way of 64 bits or
> smaller.
>
>
> Fortunately, that's not true. If you think about it, that would rule out
> not only arrays, but any string longer than 8 bytes long!
>
> The way PHP handles this is called "copy-on-write" (COW), where multiple
> variables can point to the same zval until one of them needs to write to
> it, at which point a copy is transparently created.
>
>
> The pointer for this value would fit in the 64 bits, which is how objects
> work, but that's also why objects have different semantics for scope than
> integers. Objects are potentially very large in memory, so we refcount them
> and pass the pointer into child scopes, instead of copying the value like
> is done with integers.
>
>
> Objects are not the only thing that is refcounted. In fact, in PHP 4.x and
> 5.x, *every* zval used a refcount and COW approach; changing some types to
> be eagerly copied instead was one of the major performance improvements in
> the "PHP NG" project which formed the basis of PHP 7.0. You can actually
> see this in action here: https://3v4l.org/oPgr4
>
> This is all completely transparent to the user, as are a bunch of other
> memory/speed optimisations, like interned string literals, packed arrays,
> etc.
>
> So, there may be performance gains if we can squeeze values into the zval
> memory, but it doesn't need to affect the semantics of the new type.
>
I have mentioned before that my understanding of the deeper aspects of how
zvals work is very lacking compared to some others, so this is very
helpful. I was of course aware that strings and arrays can be larger than
64 bits, but was under the impression that the hashtable structure in part
was responsible for those being somewhat different. I confess that I do not
understand the technical intricacies of the interned strings and packed
arrays, I just understand that the zval structure for these arbitrary
precision values would probably be non-trivial, and from what I was able to
research and determine that was in part related to the 64bit zval limit.
But thank you for the clarity and the added detail, it's always good to
learn places where you are mistaken, and this is all extremely helpful to
know.

This probably relates quite closely to Arvid's point that for a lot of
> uses, we don't actually need arbitrary precision, just something that can
> represent small-to-medium decimal numbers without the inaccuracies of
> binary floating point. That some libraries can be used for both purposes is
> not necessarily evidence that we could ever "bless" one for both use cases
> and make it a single native type.


Honestly, if you need a scale of less than about 15 and simply want FP
error free decimals, BCMath is perfectly adequate for that in most of the
use cases I described. The larger issue for a lot of these applications is
not that they need to calculate 50 digits of accuracy and BCMath is too
slow, it's that they need non-arithmetic operations, such as sin(), cos(),
exp(), vector multiplication, dot products, etc., while maintaining that
low to medium decimal accuracy. libbcmath just doesn't support those
things, and creating your own implementation of say the sin() function that
maintains arbitrary precision is... challenging. It compounds the
performance deficiencies of BCMath exponentially, as you have to break it
into many different arithmetic operations.

To me, while being 100x to 1000x more performant at arithmetic is certainly
reason enough on its own, the fact that MPFR (for example) has C
implementations for more complex operations that can be utilized is the
real selling point. The ext-stats extension hasn't been maintained since
7.4. And trig is critical for a lot of stats functions. A fairly common use
of stats, even in applications you might not expect it, is to generate a
Gaussian Random Number. That is, generate a random number where if you
continued generating random numbers from the same generator, they would
form a normal distribution (a bell curve), so the random number is weighted
according to the distribution.

The simplest way to do that is with the sin() and cos() functions (picking
a point on a circle). But a lot of really useful such mathematics are
mainly provided by libraries that ALSO provide arbitrary precision. So for
instance, the Gamma Function is another very common function in statistics.
To me, implementing a bundled or core type that utilizes MPFR (or something
similar) is as much about getting access to THESE mathematical functions as
it is the arbitrary precision aspect.

Jordan

Reply via email to