On Sun, Nov 26, 2017 at 07:40:03PM +0000, kshe wrote:
> Hi,
>
> The `Z' command can be a handy shortcut for computing logarithms; as
> such, for example, it is the basis of the implementation of bc(1)'s `l'
> function. However, the algorithm currently used in count_digits() is
> too naive to be really useful, becoming rapidly much slower than what
> would be expected from a native command.
>
> To see how this computation could easily be made exponentially faster,
> one may start by noticing that, if next() is the function defined for
> any real r as
>
> next(r) := floor(r) + 1,
>
> then clearly, for any strictly positive integer x,
>
> floor(log_2(x)) <= log_2(x) < next(log_2(x))
>
> and therefore
>
> log_10(2) * floor(log_2(x)) <= log_10(x) < k,
>
> where
>
> k := log_10(2) * next(log_2(x)).
>
> Since log_10(2) < 1, it follows that
>
> floor(k) <= next(k - log_10(2)) <= next(log_10(x)) <= next(k),
>
> which proves that next(log_10(x)) is either floor(k) or next(k).
>
> If next(log_10(x)) = floor(k), then
>
> 10^floor(k) = 10^next(log_10(x)) > 10^log_10(x) = x.
>
> If next(log_10(x)) = next(k), then
>
> 10^floor(k) = 10^floor(log_10(x)) <= 10^log_10(x) = x.
>
> Therefore, if x >= 10^floor(k), then next(log_10(x)) cannot be floor(k),
> hence it must be next(k); likewise, if x < 10^floor(k), then
> next(log_10(x)) cannot be next(k), hence it must be floor(k). Using the
> conventional integer value of a boolean expression, this result can be
> summarised as
>
> next(log_10(x)) = floor(k) + (x >= 10^floor(k)).
>
> As an additional refinement, one may further notice that if
>
> floor(k) = floor(log_10(2) * floor(log_2(x)))
>
> then
>
> 10^floor(k) = 10^floor(log_10(2) * floor(log_2(x)))
> <= 10^(log_10(2) * floor(log_2(x)))
> <= 2^floor(log_2(x))
> <= x
>
> so that it can readily be concluded that
>
> next(log_10(x)) = next(k)
>
> without having to compute 10^floor(k).
>
> The BN library permits computation of k in O(1) and 10^floor(k) in
> O(log(k)) which is O(log(log(x))). Therefore, one can compute
> next(log_10(x)) in O(1) most of the time (at least on average, and with
> a certain definition of such average, the full analysis of which is, I
> presume, outside the scope of this message), with a worst case of
> O(log(log(x))). In contrast, the current code is exponentially worse
> than what its worst case should be, computing this value in O(log(x)).
>
> $ jot -b 9 -s '' 65536 >script
> $ echo Z >>script
>
> $ time dc script
> 0m03.57s real 0m03.56s user 0m00.01s system
> $ time ./dc script
> 0m00.12s real 0m00.12s user 0m00.00s system
>
> The diff below implements this optimisation. It also fixes a small
> logic error in split_number(), which is used by count_digits().
General comment: it would be a good thing to add (part of) the proof
or a reference to some published work to the code in a comment.
Especially the derivation of c and the computation of d seem a bit
like dropping out of thin air.
>From a style point of view I do not like the assignment in
conditionals very much.
There's also the problem of bits * c overflowing, though that's likely
theoretical.
-Otto
>
> Index: bcode.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/dc/bcode.c,v
> retrieving revision 1.51
> diff -u -p -r1.51 bcode.c
> --- bcode.c 26 Feb 2017 11:29:55 -0000 1.51
> +++ bcode.c 17 Nov 2017 02:38:12 -0000
> @@ -385,9 +381,10 @@ split_number(const struct number *n, BIG
>
> bn_checkp(BN_copy(i, n->number));
>
> - if (n->scale == 0 && f != NULL)
> - bn_check(BN_set_word(f, 0));
> - else if (n->scale < sizeof(factors)/sizeof(factors[0])) {
> + if (n->scale == 0) {
> + if (f != NULL)
> + bn_check(BN_set_word(f, 0));
> + } else if (n->scale < sizeof(factors)/sizeof(factors[0])) {
> rem = BN_div_word(i, factors[n->scale]);
> if (f != NULL)
> bn_check(BN_set_word(f, rem));
> @@ -692,25 +689,40 @@ push_scale(void)
> static u_int
> count_digits(const struct number *n)
> {
> - struct number *int_part, *fract_part;
> - u_int i;
> + BIGNUM *int_part, *a, *p;
> + BN_CTX *ctx;
> + uint64_t d, c = 0x4D104D42;
> + int bits;
>
> if (BN_is_zero(n->number))
> return n->scale ? n->scale : 1;
>
> - int_part = new_number();
> - fract_part = new_number();
> - fract_part->scale = n->scale;
> - split_number(n, int_part->number, fract_part->number);
> -
> - i = 0;
> - while (!BN_is_zero(int_part->number)) {
> - (void)BN_div_word(int_part->number, 10);
> - i++;
> - }
> - free_number(int_part);
> - free_number(fract_part);
> - return i + n->scale;
> + bn_checkp(int_part = BN_new());
> + split_number(n, int_part, NULL);
> +
> + if ((bits = BN_num_bits(int_part)) == 0)
> + d = 0;
> + else if ((d = (c * bits) >> 32) != (c * (bits - 1)) >> 32) {
> + bn_checkp(ctx = BN_CTX_new());
> + bn_checkp(a = BN_new());
> + bn_checkp(p = BN_new());
> +
> + bn_check(BN_set_word(a, 10));
> + bn_check(BN_set_word(p, d));
> + bn_check(BN_exp(a, a, p, ctx));
> +
> + if (BN_ucmp(int_part, a) >= 0)
> + d++;
> +
> + BN_CTX_free(ctx);
> + BN_free(a);
> + BN_free(p);
> + } else
> + d++;
> +
> + BN_free(int_part);
> +
> + return d + n->scale;
> }
>
> static void
>
> Regards,
>
> kshe