Re: dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

kshe Thu, 30 Nov 2017 05:11:07 -0800

On Thu, 30 Nov 2017 07:22:42 +0000, Otto Moerbeek wrote:
> On Sun, Nov 26, 2017 at 07:40:03PM +0000, kshe wrote:
> > Hi,
> >
> > The `Z' command can be a handy shortcut for computing logarithms; as
> > such, for example, it is the basis of the implementation of bc(1)'s `l'
> > function.  However, the algorithm currently used in count_digits() is
> > too naive to be really useful, becoming rapidly much slower than what
> > would be expected from a native command.
> >
> > To see how this computation could easily be made exponentially faster,
> > one may start by noticing that, if next() is the function defined for
> > any real r as
> >
> >     next(r) := floor(r) + 1,
> >
> > then clearly, for any strictly positive integer x,
> >
> >     floor(log_2(x)) <= log_2(x) < next(log_2(x))
> >
> > and therefore
> >
> >     log_10(2) * floor(log_2(x)) <= log_10(x) < k,
> >
> > where
> >
> >     k := log_10(2) * next(log_2(x)).
> >
> > Since log_10(2) < 1, it follows that
> >
> >     floor(k) <= next(k - log_10(2)) <= next(log_10(x)) <= next(k),
> >
> > which proves that next(log_10(x)) is either floor(k) or next(k).
> >
> > If next(log_10(x)) = floor(k), then
> >
> >     10^floor(k) = 10^next(log_10(x)) > 10^log_10(x) = x.
> >
> > If next(log_10(x)) = next(k), then
> >
> >     10^floor(k) = 10^floor(log_10(x)) <= 10^log_10(x) = x.
> >
> > Therefore, if x >= 10^floor(k), then next(log_10(x)) cannot be floor(k),
> > hence it must be next(k); likewise, if x < 10^floor(k), then
> > next(log_10(x)) cannot be next(k), hence it must be floor(k).  Using the
> > conventional integer value of a boolean expression, this result can be
> > summarised as
> >
> >     next(log_10(x)) = floor(k) + (x >= 10^floor(k)).
> >
> > As an additional refinement, one may further notice that if
> >
> >     floor(k) = floor(log_10(2) * floor(log_2(x)))
> >
> > then
> >
> >     10^floor(k) = 10^floor(log_10(2) * floor(log_2(x)))
> >                 <= 10^(log_10(2) * floor(log_2(x)))
> >                 <= 2^floor(log_2(x))
> >                 <= x
> >
> > so that it can readily be concluded that
> >
> >     next(log_10(x)) = next(k)
> >
> > without having to compute 10^floor(k).
> >
> > The BN library permits computation of k in O(1) and 10^floor(k) in
> > O(log(k)) which is O(log(log(x))).  Therefore, one can compute
> > next(log_10(x)) in O(1) most of the time (at least on average, and with
> > a certain definition of such average, the full analysis of which is, I
> > presume, outside the scope of this message), with a worst case of
> > O(log(log(x))).  In contrast, the current code is exponentially worse
> > than what its worst case should be, computing this value in O(log(x)).
> >
> >     $ jot -b 9 -s '' 65536 >script
> >     $ echo Z >>script
> >
> >     $ time dc script
> >         0m03.57s real     0m03.56s user     0m00.01s system
> >     $ time ./dc script
> >         0m00.12s real     0m00.12s user     0m00.00s system
> >
> > The diff below implements this optimisation.  It also fixes a small
> > logic error in split_number(), which is used by count_digits().
>
> General comment: it would be a good thing to add (part of) the proof
> or a reference to some published work to the code in a comment.
> Especially the derivation of c and the computation of d seem a bit
> like dropping out of thin air.


All the above proof says is that a logarithm in one base is easily
convertible to the corresponding one in another base.  Since this has
been known for a few centuries already, I see no reason to provide more
details here than for the algorithms (and associated constants) featured
in /usr/share/misc/bc.library, where no explanations of any kind are to
be found either.

> From a style point of view I do not like the assignment in
> conditionals very much.

Please feel free to apply your prefered style to this patch.

> There's also the problem of bits * c overflowing, though that's likely
> theoretical.

To provoke such overflow, one would first need a platform where ints are
larger than 32 bits, and the involved numbers would have to exceed
2^2147483648.  This is indeed unlikely to happen.

Regards,

kshe

Re: dc(1): floor(log_10(x)) should not cost more than O(log(log(x)))

Reply via email to