Dear Mr. Ralls, Excellent work! I'm happy to hear the results, although with you I'm disappointed that boost::rational didn't bring something valuable to the table. I look forward to getting to know that code some day...on an as-needed basis!
In Christ, Aaron Laws On Sat, Sep 20, 2014 at 9:21 PM, John Ralls <jra...@ceridwen.us> wrote: > > On Aug 27, 2014, at 10:31 PM, John Ralls <jra...@ceridwen.us> wrote: > > > > > On Aug 27, 2014, at 8:32 AM, Geert Janssens <janssens-ge...@telenet.be> > wrote: > > > >> On Saturday 23 August 2014 18:01:15 John Ralls wrote: > >>> So, having gotten test-lots and all of the other tests working* with > >>> libmpdecimal, I studied the Intel library for several days and > >>> couldn't figure out how to make it work, so I decided to try the GCC > >>> implementation, which offers a 128-bit IEEE 754 format that's fixed > >>> size. Since it doesn't ever call malloc, I thought it might prove > >>> faster, and indeed it is. I haven't finished integrating it -- the > >>> library doesn't provide formatted printing -- but it's far enough > >>> along that it passes all of the engine and backend tests. Some > >>> results: > >>> > >>> test-numeric, with NREPS increased to 20000 to get a reasonable > >>> execution time for profiling: master 9645ms > >>> mpDecimal 21410ms > >>> decNumber 12985ms > >>> > >>> test-lots: > >>> master 16300ms > >>> mpDecimal 20203ms > >>> decNumber 19044ms > >>> > >> > >>> The first shows the relative speed in more or less pure computation, > >>> the latter shows the overall impact on one of the longer-running > >>> tests that does a lot of other stuff. > >> John, > >> > >> Thanks for implementing this and running the tests. The topic was last > touched before my holidays so it took me a while to refresh my memory... > >> > >> decNumber clearly performs better, although both implementations lag on > our current gnc_numeric performance. > >> > >>> > >>> I haven't investigated Christian's other suggestion of aggressive > >>> rounding to eliminate the overflow issue to make room for larger > >>> denominators, nor my original idea of replacing gnc_numeric with > >>> boost::rational atop a multi-precision class (either boost::mp or > >>> gmp). > >> Do you still have plans for either ? > >> > >> I suppose aggressive rounding is orthogonal to the choice of data type. > Christian's argument that we should round as is expected in the financial > world makes sense to me but that argument does not imply any underlying > data type. > >> > >> How about the boost::rational option ? > >> > >>> I have noticed that we're doing some dumb things with Scheme, > >>> like using double as an intermediate when converting from Scheme > >>> numbers to gnc_numeric (Scheme numbers are also rational, so the > >>> conversion should be direct) and representing gnc_numerics as a tuple > >>> (num, denom) instead of just using Scheme rationals. > >> Does this mean you see potential performance gains in this as we clean > up the C<->Scheme number conversions ? > >> > >>> Neither will > >>> work for decimal floats, of course; the whole class will have to be > >>> wrapped so that computation takes place in C++. > >> Which means some performance drop again... > >> > >>> Storage in SQL is > >>> also an issue, > >> From the previous conversation I recall sqlite doesn't have a decimal > type so we can't run calculating queries on it directly. > >> > >> But how about the other two: mysql and postsgresql. Is the decimal type > you're using in your tests directly compatible with the decimal data types > in mysql and postgresql, or compatible enough to convert automatically > between them ? > >> > >>> as is maintaining backward file compatibility. > >>> > >>> Another issue is equality: In order to get tests to pass I've had to > >>> implement a fuzzy comparison where both numbers are first rounded to > >>> the smaller number of decimal places -- 2 fewer if there are 12 or > >>> more -- and compared with two roundings, first truncation and second > >>> "bankers", and declared unequal only if they're unequal in both. I > >>> hate this, but it seems to be necessary to obtain equality when > >>> dealing with large divisors (as when computing prices or interest > >>> rates). I suspect that we'd have to do something similar if we pursue > >>> aggressive rounding to avoid overflows, but the only way to know for > >>> certain is to try. > >> Ugh. :( > >> > >> So what's the current balance ? > >> > >> I see following pros and cons of your tests so far: > >> > >> Pro: > >> - using a decimal type gives us more precision > >> > >> Con: > >> - sqlite doesn't have a decimal data type, so as it currently stands we > can't run calculations in queries in that database type > >> - we loose backward/forward compatibility with earlier versions of > GnuCash > >> - decNumber or mpDecimal are new dependencies > >> - their performance is currently less than the original gnc_numeric > >> - guile doesn't know of a decimal data type so we may need some > conversion glue > >> - equality is fuzzy > >> > >> Please add if I forgot arguments on either side. > >> > >> Arguably many of the con arguments can be solved. That will effort > however. And I consider the first two more important than the others. > >> > >> So do you think the benefits (I assume there will be more than the one > I mentioned) will outweigh the drawbacks ? Does the work that will go into > it bring GnuCash enough value to continue on this track ? > >> > >> It's probably too early to tell for sure but I wanted to get your ideas > based on what we have so far. > > > > Testing boost::rational is next on the agenda. My original idea was to > use it with boost::multiprecision or gmp, but I'd prefer something that > doesn't depend on heap allocations because it's so much slower than stack > allocation and must be passed by pointer, which is a major change in the > API -- meaning a ton of cleanup work up front. I think I'll do a straight > substitution of the existing math128 with boost::rational<int64_t> just to > see what happens. > > > > I think that part of implementing immediate rounding must include > constraining denominators to powers-of-ten. The main reason is that it > makes my head hurt when I try to think about how to do rounding with > arbitrary denominators. If you consider that a big chunk of the overflow > problems arise from denominators and divisors that are large primes, it > becomes quickly apparent that avoiding large prime denominators might well > resolve much of the problem. It's also true that for real-world numbers, as > opposed to free random-generated numbers from tests, that all numbers have > powers-of-ten denominators. We'd still have many-digit-prime divisors to > deal with, but constraining denominators gives us something to round to. > Does that make sense, or does it seem the rambling of a lunatic? This > really does make my head hurt. > > Boost::Rational is a serious disappointment. Boost::rational<int64_t> > didn’t allow a significant increase in precision and is further hampered by > not providing any overflow detection. Benchmarks of test-numeric with NREPS > set to 20000 (the numbers are a bit different from before because I’m using > my Mac Pro instead of my Mac Book Air, and because these are debug builds): > > Branch Tests Time > master: 1187558 5346ms > libmpdecimal: 1180076 8718ms > boost-rational, cppint: 1187558 20903ms > boost-rational, gmp: 1187558 34232ms > > cppint means boost::multiprecision::checked_cppint128_t, a 16-byte stack > allocated multi-precision integer. “Checked” means that it throws > std::overflow_error instead of wrapping. Gmp means the Gnu Multiprecision > library. It’s supposed to be faster than cppint, but its performance is > killed by having to malloc everything. The fact that our own C code is > substantially faster than any library I’ve tried is a tribute to Linas. > > There’s another wrinkle: Boost::Rational immediately reduces all numbers > to what we called in my grade school “simplest form”, meaning no common > factors between the numerator and denominator. This actually helps prevent > overflows, but means that we have to be very careful to supply the SCU as > the rounding denominator or we’ll get unexpected rounding results. > Boost::Rational provides no rounding function of its own so I rewrote > gnc_numeric_convert into C++ using the overloaded operators from > boost::multiprecision. That at least taught me about rounding arbitrary > denominators, so my head doesn’t explode any more. > > The good news is that using 128-bit numbers for all internal > representations along with aggressive reduction and a tweak to > get_random_gnc_numeric() so that the actual number doesn’t exceed 1E13/1 > and careful attention to rounding prevents overflow errors during testing, > at least up through test-lots. > > Looking a bit more at rounding, it doesn’t appear to me that at 14 out of > 151 gnc_numeric operations in the code base we’re over-using > GNC_HOW_RND_NEVER. I’m not convinced that it would help much to eliminate > those cases. > > It looks like the best solution is to work over our existing gnc-numeric > with math128 implementation so that the internals are always 128-bit and we > don’t declare overflows prematurely. > > But first it’s time to squash some bugs before next week’s release. > > Regards, > John Ralls > > > _______________________________________________ > gnucash-devel mailing list > gnucash-devel@gnucash.org > https://lists.gnucash.org/mailman/listinfo/gnucash-devel > _______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel