On Mon, Apr 8, 2024 at 12:23 PM Rowan Tommins [IMSoP] <imsop....@rwec.co.uk> wrote:
> > As I mentioned in the discussion about a "scalar arbitrary precision > type", the idea of a scalar in this meaning is a non-trivial challenge, as > the zval can only store a value that is treated in this way of 64 bits or > smaller. > > > Fortunately, that's not true. If you think about it, that would rule out > not only arrays, but any string longer than 8 bytes long! > > The way PHP handles this is called "copy-on-write" (COW), where multiple > variables can point to the same zval until one of them needs to write to > it, at which point a copy is transparently created. > > > The pointer for this value would fit in the 64 bits, which is how objects > work, but that's also why objects have different semantics for scope than > integers. Objects are potentially very large in memory, so we refcount them > and pass the pointer into child scopes, instead of copying the value like > is done with integers. > > > Objects are not the only thing that is refcounted. In fact, in PHP 4.x and > 5.x, *every* zval used a refcount and COW approach; changing some types to > be eagerly copied instead was one of the major performance improvements in > the "PHP NG" project which formed the basis of PHP 7.0. You can actually > see this in action here: https://3v4l.org/oPgr4 > > This is all completely transparent to the user, as are a bunch of other > memory/speed optimisations, like interned string literals, packed arrays, > etc. > > So, there may be performance gains if we can squeeze values into the zval > memory, but it doesn't need to affect the semantics of the new type. > I have mentioned before that my understanding of the deeper aspects of how zvals work is very lacking compared to some others, so this is very helpful. I was of course aware that strings and arrays can be larger than 64 bits, but was under the impression that the hashtable structure in part was responsible for those being somewhat different. I confess that I do not understand the technical intricacies of the interned strings and packed arrays, I just understand that the zval structure for these arbitrary precision values would probably be non-trivial, and from what I was able to research and determine that was in part related to the 64bit zval limit. But thank you for the clarity and the added detail, it's always good to learn places where you are mistaken, and this is all extremely helpful to know. This probably relates quite closely to Arvid's point that for a lot of > uses, we don't actually need arbitrary precision, just something that can > represent small-to-medium decimal numbers without the inaccuracies of > binary floating point. That some libraries can be used for both purposes is > not necessarily evidence that we could ever "bless" one for both use cases > and make it a single native type. Honestly, if you need a scale of less than about 15 and simply want FP error free decimals, BCMath is perfectly adequate for that in most of the use cases I described. The larger issue for a lot of these applications is not that they need to calculate 50 digits of accuracy and BCMath is too slow, it's that they need non-arithmetic operations, such as sin(), cos(), exp(), vector multiplication, dot products, etc., while maintaining that low to medium decimal accuracy. libbcmath just doesn't support those things, and creating your own implementation of say the sin() function that maintains arbitrary precision is... challenging. It compounds the performance deficiencies of BCMath exponentially, as you have to break it into many different arithmetic operations. To me, while being 100x to 1000x more performant at arithmetic is certainly reason enough on its own, the fact that MPFR (for example) has C implementations for more complex operations that can be utilized is the real selling point. The ext-stats extension hasn't been maintained since 7.4. And trig is critical for a lot of stats functions. A fairly common use of stats, even in applications you might not expect it, is to generate a Gaussian Random Number. That is, generate a random number where if you continued generating random numbers from the same generator, they would form a normal distribution (a bell curve), so the random number is weighted according to the distribution. The simplest way to do that is with the sin() and cos() functions (picking a point on a circle). But a lot of really useful such mathematics are mainly provided by libraries that ALSO provide arbitrary precision. So for instance, the Gamma Function is another very common function in statistics. To me, implementing a bundled or core type that utilizes MPFR (or something similar) is as much about getting access to THESE mathematical functions as it is the arbitrary precision aspect. Jordan