On Sep 24, 2014, at 2:10 AM, Geert Janssens <geert.gnuc...@kobaltwit.be> wrote:
> On Saturday 20 September 2014 18:21:44 John Ralls wrote: >> On Aug 27, 2014, at 10:31 PM, John Ralls <jra...@ceridwen.us> wrote: >>> On Aug 27, 2014, at 8:32 AM, Geert Janssens <janssens- > ge...@telenet.be> wrote: >>>> On Saturday 23 August 2014 18:01:15 John Ralls wrote: >>>>> So, having gotten test-lots and all of the other tests working* >>>>> with >>>>> libmpdecimal, I studied the Intel library for several days and >>>>> couldn't figure out how to make it work, so I decided to try the >>>>> GCC >>>>> implementation, which offers a 128-bit IEEE 754 format that's >>>>> fixed >>>>> size. Since it doesn't ever call malloc, I thought it might prove >>>>> faster, and indeed it is. I haven't finished integrating it -- the >>>>> library doesn't provide formatted printing -- but it's far enough >>>>> along that it passes all of the engine and backend tests. Some >>>>> results: >>>>> >>>>> test-numeric, with NREPS increased to 20000 to get a reasonable >>>>> execution time for profiling: master 9645ms >>>>> >>>>> mpDecimal 21410ms >>>>> decNumber 12985ms >>>>> >>>>> test-lots: >>>>> master 16300ms >>>>> mpDecimal 20203ms >>>>> decNumber 19044ms >>>>> >>>>> The first shows the relative speed in more or less pure >>>>> computation, >>>>> the latter shows the overall impact on one of the longer-running >>>>> tests that does a lot of other stuff. >>>> >>>> John, >>>> >>>> Thanks for implementing this and running the tests. The topic was >>>> last touched before my holidays so it took me a while to refresh >>>> my memory... >>>> >>>> decNumber clearly performs better, although both implementations >>>> lag on our current gnc_numeric performance.>> >>>>> I haven't investigated Christian's other suggestion of aggressive >>>>> rounding to eliminate the overflow issue to make room for larger >>>>> denominators, nor my original idea of replacing gnc_numeric with >>>>> boost::rational atop a multi-precision class (either boost::mp or >>>>> gmp). >>>> >>>> Do you still have plans for either ? >>>> >>>> I suppose aggressive rounding is orthogonal to the choice of data >>>> type. Christian's argument that we should round as is expected in >>>> the financial world makes sense to me but that argument does not >>>> imply any underlying data type. >>>> >>>> How about the boost::rational option ? >>>> >>>>> I have noticed that we're doing some dumb things with Scheme, >>>>> like using double as an intermediate when converting from Scheme >>>>> numbers to gnc_numeric (Scheme numbers are also rational, so the >>>>> conversion should be direct) and representing gnc_numerics as a >>>>> tuple >>>>> (num, denom) instead of just using Scheme rationals. >>>> >>>> Does this mean you see potential performance gains in this as we >>>> clean up the C<->Scheme number conversions ?>> >>>>> Neither will >>>>> work for decimal floats, of course; the whole class will have to >>>>> be >>>>> wrapped so that computation takes place in C++. >>>> >>>> Which means some performance drop again... >>>> >>>>> Storage in SQL is >>>>> also an issue, >>>> >>>> From the previous conversation I recall sqlite doesn't have a >>>> decimal type so we can't run calculating queries on it directly. >>>> >>>> But how about the other two: mysql and postsgresql. Is the decimal >>>> type you're using in your tests directly compatible with the >>>> decimal data types in mysql and postgresql, or compatible enough >>>> to convert automatically between them ?>> >>>>> as is maintaining backward file compatibility. >>>>> >>>>> Another issue is equality: In order to get tests to pass I've had >>>>> to >>>>> implement a fuzzy comparison where both numbers are first rounded >>>>> to >>>>> the smaller number of decimal places -- 2 fewer if there are 12 or >>>>> more -- and compared with two roundings, first truncation and >>>>> second >>>>> "bankers", and declared unequal only if they're unequal in both. I >>>>> hate this, but it seems to be necessary to obtain equality when >>>>> dealing with large divisors (as when computing prices or interest >>>>> rates). I suspect that we'd have to do something similar if we >>>>> pursue >>>>> aggressive rounding to avoid overflows, but the only way to know >>>>> for >>>>> certain is to try. >>>> >>>> Ugh. :( >>>> >>>> So what's the current balance ? >>>> >>>> I see following pros and cons of your tests so far: >>>> >>>> Pro: >>>> - using a decimal type gives us more precision >>>> >>>> Con: >>>> - sqlite doesn't have a decimal data type, so as it currently >>>> stands we can't run calculations in queries in that database type >>>> - we loose backward/forward compatibility with earlier versions of >>>> GnuCash - decNumber or mpDecimal are new dependencies >>>> - their performance is currently less than the original gnc_numeric >>>> - guile doesn't know of a decimal data type so we may need some >>>> conversion glue - equality is fuzzy >>>> >>>> Please add if I forgot arguments on either side. >>>> >>>> Arguably many of the con arguments can be solved. That will effort >>>> however. And I consider the first two more important than the >>>> others. >>>> >>>> So do you think the benefits (I assume there will be more than the >>>> one I mentioned) will outweigh the drawbacks ? Does the work that >>>> will go into it bring GnuCash enough value to continue on this >>>> track ? >>>> >>>> It's probably too early to tell for sure but I wanted to get your >>>> ideas based on what we have so far.> >>> Testing boost::rational is next on the agenda. My original idea was >>> to use it with boost::multiprecision or gmp, but I'd prefer >>> something that doesn't depend on heap allocations because it's so >>> much slower than stack allocation and must be passed by pointer, >>> which is a major change in the API -- meaning a ton of cleanup work >>> up front. I think I'll do a straight substitution of the existing >>> math128 with boost::rational<int64_t> just to see what happens. >>> >>> I think that part of implementing immediate rounding must include >>> constraining denominators to powers-of-ten. The main reason is that >>> it makes my head hurt when I try to think about how to do rounding >>> with arbitrary denominators. If you consider that a big chunk of >>> the overflow problems arise from denominators and divisors that are >>> large primes, it becomes quickly apparent that avoiding large prime >>> denominators might well resolve much of the problem. It's also true >>> that for real-world numbers, as opposed to free random-generated >>> numbers from tests, that all numbers have powers-of-ten >>> denominators. We'd still have many-digit-prime divisors to deal >>> with, but constraining denominators gives us something to round to. >>> Does that make sense, or does it seem the rambling of a lunatic? >>> This really does make my head hurt. >> Boost::Rational is a serious disappointment. Boost::rational<int64_t> >> didn’t allow a significant increase in precision and is further >> hampered by not providing any overflow detection. Benchmarks of >> test-numeric with NREPS set to 20000 (the numbers are a bit different >> from before because I’m using my Mac Pro instead of my Mac Book Air, >> and because these are debug builds): >> >> Branch Tests Time >> master: 1187558 5346ms >> libmpdecimal: 1180076 8718ms >> boost-rational, cppint: 1187558 20903ms >> boost-rational, gmp: 1187558 34232ms >> >> cppint means boost::multiprecision::checked_cppint128_t, a 16-byte >> stack allocated multi-precision integer. “Checked” means that it >> throws std::overflow_error instead of wrapping. Gmp means the Gnu >> Multiprecision library. It’s supposed to be faster than cppint, but >> its performance is killed by having to malloc everything. The fact >> that our own C code is substantially faster than any library I’ve >> tried is a tribute to Linas. >> >> There’s another wrinkle: Boost::Rational immediately reduces all >> numbers to what we called in my grade school “simplest form”, meaning >> no common factors between the numerator and denominator. This >> actually helps prevent overflows, but means that we have to be very >> careful to supply the SCU as the rounding denominator or we’ll get >> unexpected rounding results. Boost::Rational provides no rounding >> function of its own so I rewrote gnc_numeric_convert into C++ using >> the overloaded operators from boost::multiprecision. That at least >> taught me about rounding arbitrary denominators, so my head doesn’t >> explode any more. >> >> The good news is that using 128-bit numbers for all internal >> representations along with aggressive reduction and a tweak to >> get_random_gnc_numeric() so that the actual number doesn’t exceed >> 1E13/1 and careful attention to rounding prevents overflow errors >> during testing, at least up through test-lots. >> >> Looking a bit more at rounding, it doesn’t appear to me that at 14 out >> of 151 gnc_numeric operations in the code base we’re over-using >> GNC_HOW_RND_NEVER. I’m not convinced that it would help much to >> eliminate those cases. >> >> It looks like the best solution is to work over our existing >> gnc-numeric with math128 implementation so that the internals are >> always 128-bit and we don’t declare overflows prematurely. >> > Thanks for the update and the elaborate testing. > > So,... math128 is what we use now, using the rational representation of > numbers, do I get that right ? And the best option is to stick with it > and improve on it ? Would you still transform it into C++ so it becomes > an object with properties and members ? Yes to all. Regards, John Ralls _______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel