Re: RFC : Correcting some problems in rounding/number handling

Bill Gribble Wed, 05 Jul 2000 08:43:31 -0700
Christopher Browne <[EMAIL PROTECTED]> writes:
> Come on, people.  The issue is _not_ what "object system" is being
> used, or what language is being used, but rather _what the numeric
> representation should be_.

I think most of your comments here are right on point.  However, I
disagree with you on a couple of things.  I'd appreciate hearing your
responses.

At the top level, I think the first thing to nail down is a numeric
representation rather than a monetary-value representation.  There
needs to be a layer that enforces restrictions such as "can't add
dollars to pounds", but I believe that we must first solve the
lower-level problem; it *is* possible to add a number denominated in
hundredths to a number denominated in thousandths.

The lowest level of arithmetic operations should be agnostic about
currencies, and so (I think) the data structure representing numeric
values should have no information about currency in it.  The most
primitive level of financial information, including currency, is ATM
the 'split', and I believe that currency restrictions should remain in
the split and in operations on splits.  

Side note: In trying to diagram the architecture of gnucash as it
exists now, we have discovered that there are several types of
"financial restrictions" that are implemented and enforced in a
variety of places in gnucash.  For example, subaccounts can only have
a set of account-types that are related to the parent's type;
transactions can't be entered if the splits don't have a common
currency; and so on.  It may make sense to put all such restrictions
on the values of particular objects/operations into a single "module"
which lives at the engine level.

> -> The numeric amount should involve a "big integer," and a radix.

This may be a terminological misunderstanding, but I hear "radix" and
I think "position of the decimal point."  If that's what you mean
(which interpretation is borne out by your choice of uint8 for the
radix) I disagree.  I think we need a denominator rather than a radix,
and the denominator needs to be at least uint32.  In keeping with my
preamble above, I want a representation that we can use for *all* the
variables in financial formulas, including prices and
fractional-share-amounts, and for that we need non-decimal
denominators.

Yes, most of the exchanges are going decimal (but not all -- see Jon
Trowbridge's recent post), but the historical data will be around
forever.

Jon has suggested using a small number of bits of the "big integer" as
an index into a table of the (finite) possible values for the
denominator.  This lets us use just one 64-bit int as the number.  I
don't think I like this (why restrict values, and is the C compiler
smart enough to enforce type safety?) but it's a possibility.

> -> There should be some indication of the currency that is involved.

I disagree with this.  I think the currency information belongs at the
level where currency actually comes in to play; that is, at the level
of the single financial event or journal entry.  "You can't add pounds
to dollars" is a policy statement from a particular domain; I think it
makes sense to provide the mechanism at the lowest level and enforce
policy from above.  This is in keeping with the philosophy that has
been used throughout the engine.

> If others have suggestions for _data structures_, feel free to 
> suggest reasons why your _data structure_ is preferable to one
> of the above.

Well, it's not so much a question of data structures as it is of data
semantics and the functional properties of the API.

The data structure I prefer is 

struct gnc_numeric {
  int64  numerator;
  int32  denominator;
};

This looks like a rational number, and it is, but the key thing is the
way these structures are manipulated by the arithmetic routines that
use them.  I don't think traditional rational-number semantics are
appropriate (i.e. finding a relatively-prime numerator and denominator
after every operation) and we won't always be performing operations
that result in exact answers... one of the main reasons for this whole
design exercise is to have a representation that can handle the
*inexact* nature of financial transactions, which operate in whole
numbers of smallest-transaction-units even if the actual computed cost
of the transaction is an exact value which is not a whole number of
smallest-transaction-units.

The API should include a number of features that you don't discuss:

- Control over denominator-conversion policy.

Somehow (either by pre-setting the denominator of the result-struct or
passing in an extra argument) we need to be able to specify what
happens when we operate on arguments with different denominators.  For
example, multiplying a number of shares(1000ths) by a price (64ths) to
get a total value (pennies), we need to use ceil() to get the
next-highest whole penny.  We should be able to do this without a
bunch of other nonsense.  

this example assumes a pointer-based API, but we could just as easily
hand around the entire structure and not have a return-argument.

In this example, the price is 63/64 and we are buying 1 share: 

  gnc_numeric  price      = { 63, 64 };
  gnc_numeric  num_shares = { 1000, 1000 };
  gnc_numeric  result     = { 0, 100 };

  gnc_numeric_multiply(&price, &num_shares, &result, GNC_CEIL);

result should contain { 99, 100 } because it costs 99 cents to buy a 
share that's priced at .984375 .  

Under other circumstances, we may want the operation to be carried out
*exactly*, that is, to select a denominator for result so that no it
can be represented exactly.

  gnc_numeric value_1 = { 11, 13 };
  gnc_numeric value_2 = { 9, 11 };
  gnc_numeric result; 

  gnc_numeric_add(& value_1, & value_2, & result, GNC_EXACT);

.. with result ending up with { 238, 143 }.  

We may not need this ability right now, but it's not a big leap in
complexity and more generality is better, IMO, as long as generality
doesn't interfere with the specified design goals.

- An interface to get information about truncation/promotion errors.  

When returning a result that does not exactly represent the result of
a computation, we need to return information about the difference
between the exact value and the returned value.  This may be through
an alternate API that takes an extra return-argument pointer which
holds an exact representation of the difference between the returned
value and the exact result of the computation (which should be easily
computed given the remainder of the API I discuss, at a little extra
computational cost).

Bill Gribble


--
Gnucash Developer's List
To unsubscribe send empty email to: [EMAIL PROTECTED]
Re: RFC : Correcting some problems in rounding/number handling

Reply via email to