On Sun, Apr 7, 2024 at 2:45 PM Rowan Tommins [IMSoP] <imsop....@rwec.co.uk> wrote:
> On 07/04/2024 20:55, Jordan LeDoux wrote: > > > I have been doing small bits of work, research, and investigation into > > an MPDec or MPFR implementation for years, and I'm likely to continue > > doing my research on that regardless of whatever is discussed in this > > thread. > > > I absolutely encourage you to do that. What I'm hoping is that you can > share some of what you already know now, so that while we're discussing > BCMath\Number, we can think ahead a bit to what other similar APIs we > might build in the future. The below seems to be exactly that. > > > > > Yes. BCMath uses fixed-scale, all the other libraries use > > fixed-precision. That is, the other libraries use a fixed number of > > significant digits, while BCMath uses a fixed number of digits after > > the decimal point. > > > That seems like a significant difference indeed, and one that is > potentially far more important than whether we build an OO wrapper or a > "scalar" one. > > By a "scalar" value I mean a value that has the same semantics for reading, writing, copying, passing-by-value, passing-by-reference, and passing-by-pointer (how objects behave) as the integer, float, or boolean types. As I mentioned in the discussion about a "scalar arbitrary precision type", the idea of a scalar in this meaning is a non-trivial challenge, as the zval can only store a value that is treated in this way of 64 bits or smaller. However, the actual numerical value that is used by every single one of these libraries is not guaranteed to be 64 bits or smaller, and for some of them is in fact guaranteed to be larger. The pointer for this value would fit in the 64 bits, which is how objects work, but that's also why objects have different semantics for scope than integers. Objects are potentially very large in memory, so we refcount them and pass the pointer into child scopes, instead of copying the value like is done with integers. Both this and the precision/scale question are pretty significant design questions and choices. While the arbitrary precision values of these libraries will not fit inside a zval, they are on average smaller than PHP objects in memory, so it may not be a significant problem to eagerly copy them like we do with integers. However, if that is not the route that is taken, they could end up having scoping semantics that are similar to objects, even if we don't give them a full class entry with a constructor, properties, etc. This is part of the reason that, for example, the ext-decimal implementation which uses the MPDec library represents these numbers as an object with a fluent interface. > > > So, for instance, it would not actually be possible without manual > > rounding in the PHP implementation to force exactly 2 decimal digits > > of accuracy in the result and no more with MPDec. > > > The current BCMath proposal is to mostly choose the scale calculations > automatically, and to give precise control of rounding. Neither of those > are implemented in libbcmath, which requires an explicit scale, and > simply truncates the result at that point. > > That's why I said that the proposal isn't really about "an OO wrapper > for BCMath" any more, it's a fairly generic Number API, with libbcmath > as the back-end which we currently have available. So thinking about > what other back-ends we might build with the same or similar wrappers is > useful and relevant. > > In general I would say that libbcmath is different enough from other backends that we should not expect any work on a BCMath implementation to be utilized in other implementations. It *could* be that we are able to do that, but it should not be something people *expect* to happen because of the technical differences. Some of the broader language design choices would be transferable though. For instance, the standard names of various calculation functions/methods are something that would remain independent, even with the differences in the implementation. > > > The idea of money, for instance, wanting exactly two digits would > > require the implementation to round, because something like 0.00000013 > > has two digits of *precision*, which is what MPDec uses, but it has 8 > > digits of scale which is what BCMath uses. > > > This brings us back to what the use cases are we're trying to cover with > these wrappers. > > The example of fixed-scale money is not just a small niche that I happen > to know about: brick/money has 16k stars on GitHub, and 18 million > installs on Packagist; moneyphp/money has 4.5k stars and 45 million > installs; one has implementations based on plain PHP, GMP, and BCMath; > the other has a hard dependency on BCMath. > > Presumably, there are other use cases where working with precision > rather than scale is essential, maybe just as popular (or that could be > just as popular, if they could be implemented better). > > In which case, should we be designing a NumberInterface that provides > both, with BCMath having a custom (and maybe slow) implementation for > round-to-precision, and MPDec/MPFR having a custom (and maybe slow) > implementation for round-to-scale? > > Or, should we abandon the idea of having one preferred number-handling > API (whether that's NumberInterface or a core decimal type), because no > implementation could handle both use cases? > > The implementation for round-to-precision for BCMath would be much slower than the implementation for round-to-scale for MPDec/MPFR, even if the underlying calculations were done at the same performance. The main challenge for the precision vs. scale issue is that precision *also* includes the integer part for some implementations, while scale does not. But in general, it is easier to over-calculate using precision and then round/truncate to scale, then it is to calculate with scale not knowing until the calculation has been completed what your precision will be (for some kinds of calculations). The actual underlying math of the library is easier with scale than it is with precision. So, for instance, with a scale of 3, the minimum meaningful difference between two values is 0.001, so you can simply continue your calculation until the calculated error is less than this value. Fortunately, using libraries means that these underlying mathematical implementations do not need to be struggled with in whatever PHP implementation we do for either. My intuition at the moment is that a single number-handling API would be challenging to do without an actual proposed implementation on the table for MPDec/MPFR. The best we can do at the moment is probably reference Rudi's implementation in ext-decimal. For money calculations, scale is always likely to be a more useful configuration. For mathematical calculations (such as machine learning applications, which I would say is the other very large use case for this kind of capability), precision is likely to be the more useful configuration. Other applications that I have personally encountered include: simulation and modeling, statistical distributions, and data analysis. Most of these can be done with fair accuracy without arbitrary precision, but there are certainly types of applications that would benefit from or even require arbitrary precision in these spaces. PHP at the moment is not a language that has many applications in these spaces, with most of the actual use cases being money. My view is that this is driven by the language features, not because PHP is ill-suited to these applications or that developers using PHP have no applications that would benefit from these other capabilities. There are an array of features that are available readily in Python that are heavily used for these sorts of applications that PHP lacks, however PHP offers several things that might make it the more attractive option if it had similar mathematical features. PHP is, in general, faster than Python and more performant in the areas where the language is being used directly. However, Python allows direct access to some very low level capabilities that are implemented directly in C, and in those areas it certainly has an edge. For instance, Python has modules that allow for direct off-loading to GPUs. It has an extensive and performant set of mathematical libraries in NumPy and SciPy that make interacting with complex mathematics seamless and fast. PHP actually also has an extension to allow off-loading to a GPU, though even with extensions it has very little mathematical library support. The ext-decimal extension is probably the best example, and it has almost nothing beyond basic arithmetic. Python also has userspace operator overloading for its objects, which are extremely useful for all of these spaces, but that's an entirely different discussion than what we are talking about here. But even with these extensions available in PHP, they are barely used by developers at all because (at least in part) of the enormous difference between PECL and PIP. For PHP, I do not think that extensions are an adequate substitute like PIP modules are for Python. This is, essentially, the thesis of the research and work that I have done in the space since joining the internals mailing list. Jordan