Re: [PHP-DEV] Native decimal scalar support and object types in BcMath - do we want both?

Jordan LeDoux Sun, 07 Apr 2024 15:52:32 -0700

On Sun, Apr 7, 2024 at 2:45 PM Rowan Tommins [IMSoP] <imsop....@rwec.co.uk>
wrote:

> On 07/04/2024 20:55, Jordan LeDoux wrote:
>
> > I have been doing small bits of work, research, and investigation into
> > an MPDec or MPFR implementation for years, and I'm likely to continue
> > doing my research on that regardless of whatever is discussed in this
> > thread.
>
>
> I absolutely encourage you to do that. What I'm hoping is that you can
> share some of what you already know now, so that while we're discussing
> BCMath\Number, we can think ahead a bit to what other similar APIs we
> might build in the future. The below seems to be exactly that.
>
>
>
> > Yes. BCMath uses fixed-scale, all the other libraries use
> > fixed-precision. That is, the other libraries use a fixed number of
> > significant digits, while BCMath uses a fixed number of digits after
> > the decimal point.
>
>
> That seems like a significant difference indeed, and one that is
> potentially far more important than whether we build an OO wrapper or a
> "scalar" one.
>
>
By a "scalar" value I mean a value that has the same semantics for reading,
writing, copying, passing-by-value, passing-by-reference, and
passing-by-pointer (how objects behave) as the integer, float, or boolean
types. As I mentioned in the discussion about a "scalar arbitrary precision
type", the idea of a scalar in this meaning is a non-trivial challenge, as
the zval can only store a value that is treated in this way of 64 bits or
smaller. However, the actual numerical value that is used by every single
one of these libraries is not guaranteed to be 64 bits or smaller, and for
some of them is in fact guaranteed to be larger.

The pointer for this value would fit in the 64 bits, which is how objects
work, but that's also why objects have different semantics for scope than
integers. Objects are potentially very large in memory, so we refcount them
and pass the pointer into child scopes, instead of copying the value like
is done with integers.

Both this and the precision/scale question are pretty significant design
questions and choices. While the arbitrary precision values of these
libraries will not fit inside a zval, they are on average smaller than PHP
objects in memory, so it may not be a significant problem to eagerly copy
them like we do with integers. However, if that is not the route that is
taken, they could end up having scoping semantics that are similar to
objects, even if we don't give them a full class entry with a constructor,
properties, etc. This is part of the reason that, for example, the
ext-decimal implementation which uses the MPDec library represents these
numbers as an object with a fluent interface.

>
> > So, for instance, it would not actually be possible without manual
> > rounding in the PHP implementation to force exactly 2 decimal digits
> > of accuracy in the result and no more with MPDec.
>
>
> The current BCMath proposal is to mostly choose the scale calculations
> automatically, and to give precise control of rounding. Neither of those
> are implemented in libbcmath, which requires an explicit scale, and
> simply truncates the result at that point.
>
> That's why I said that the proposal isn't really about "an OO wrapper
> for BCMath" any more, it's a fairly generic Number API, with libbcmath
> as the back-end which we currently have available. So thinking about
> what other back-ends we might build with the same or similar wrappers is
> useful and relevant.
>
>
In general I would say that libbcmath is different enough from other
backends that we should not expect any work on a BCMath implementation to
be utilized in other implementations. It *could* be that we are able to do
that, but it should not be something people *expect* to happen because of
the technical differences.

Some of the broader language design choices would be transferable though.
For instance, the standard names of various calculation functions/methods
are something that would remain independent, even with the differences in
the implementation.

>
> > The idea of money, for instance, wanting exactly two digits would
> > require the implementation to round, because something like 0.00000013
> > has two digits of *precision*, which is what MPDec uses, but it has 8
> > digits of scale which is what BCMath uses.
>
>
> This brings us back to what the use cases are we're trying to cover with
> these wrappers.
>
> The example of fixed-scale money is not just a small niche that I happen
> to know about: brick/money has 16k stars on GitHub, and 18 million
> installs on Packagist; moneyphp/money has 4.5k stars and 45 million
> installs; one has implementations based on plain PHP, GMP, and BCMath;
> the other has a hard dependency on BCMath.
>
> Presumably, there are other use cases where working with precision
> rather than scale is essential, maybe just as popular (or that could be
> just as popular, if they could be implemented better).
>
> In which case, should we be designing a NumberInterface that provides
> both, with BCMath having a custom (and maybe slow) implementation for
> round-to-precision, and MPDec/MPFR having a custom (and maybe slow)
> implementation for round-to-scale?
>
> Or, should we abandon the idea of having one preferred number-handling
> API (whether that's NumberInterface or a core decimal type), because no
> implementation could handle both use cases?
>
>
The implementation for round-to-precision for BCMath would be much slower
than the implementation for round-to-scale for MPDec/MPFR, even if the
underlying calculations were done at the same performance. The main
challenge for the precision vs. scale issue is that precision *also*
includes the integer part for some implementations, while scale does not.
But in general, it is easier to over-calculate using precision and then
round/truncate to scale, then it is to calculate with scale not knowing
until the calculation has been completed what your precision will be (for
some kinds of calculations).

The actual underlying math of the library is easier with scale than it is
with precision. So, for instance, with a scale of 3, the minimum meaningful
difference between two values is 0.001, so you can simply continue your
calculation until the calculated error is less than this value.
Fortunately, using libraries means that these underlying mathematical
implementations do not need to be struggled with in whatever PHP
implementation we do for either.

My intuition at the moment is that a single number-handling API would be
challenging to do without an actual proposed implementation on the table
for MPDec/MPFR. The best we can do at the moment is probably reference
Rudi's implementation in ext-decimal.

For money calculations, scale is always likely to be a more useful
configuration. For mathematical calculations (such as machine learning
applications, which I would say is the other very large use case for this
kind of capability), precision is likely to be the more useful
configuration. Other applications that I have personally encountered
include: simulation and modeling, statistical distributions, and data
analysis. Most of these can be done with fair accuracy without arbitrary
precision, but there are certainly types of applications that would benefit
from or even require arbitrary precision in these spaces.

PHP at the moment is not a language that has many applications in these
spaces, with most of the actual use cases being money. My view is that this
is driven by the language features, not because PHP is ill-suited to these
applications or that developers using PHP have no applications that would
benefit from these other capabilities. There are an array of features that
are available readily in Python that are heavily used for these sorts of
applications that PHP lacks, however PHP offers several things that might
make it the more attractive option if it had similar mathematical features.
PHP is, in general, faster than Python and more performant in the areas
where the language is being used directly. However, Python allows direct
access to some very low level capabilities that are implemented directly in
C, and in those areas it certainly has an edge.

For instance, Python has modules that allow for direct off-loading to GPUs.
It has an extensive and performant set of mathematical libraries in NumPy
and SciPy that make interacting with complex mathematics seamless and fast.
PHP actually also has an extension to allow off-loading to a GPU, though
even with extensions it has very little mathematical library support. The
ext-decimal extension is probably the best example, and it has almost
nothing beyond basic arithmetic. Python also has userspace operator
overloading for its objects, which are extremely useful for all of these
spaces, but that's an entirely different discussion than what we are
talking about here.

But even with these extensions available in PHP, they are barely used by
developers at all because (at least in part) of the enormous difference
between PECL and PIP. For PHP, I do not think that extensions are an
adequate substitute like PIP modules are for Python.

This is, essentially, the thesis of the research and work that I have done
in the space since joining the internals mailing list.

Jordan

Re: [PHP-DEV] Native decimal scalar support and object types in BcMath - do we want both?

Reply via email to