[sage-devel] MPIR secrets and bsdnt

Bill Hart Mon, 21 Jun 2010 16:30:37 -0700

Hi all,

this is the first of three posts, primary audience sage-devel, esp.
those attending Sage Days in Leiden.


As many of you know, I recently stepped down from the MPIR project.
The main reason was that I need to focus on FLINT until about 1 Oct
2010, then need to transition to 80% research, 20% coding (my
colleagues at Warwick will be aghast that this is not already the
ratio - but they probably guessed as much). Now that I have stepped
down from MPIR, I can divulge all the "secret" development plans I
had, in the hope that some other people might get inspired. That is
what this email is about - potential MPIR projects that *anyone* could
work on (assuming they code in C), even research level projects!!

Firstly, in my mind, MPIR is not going away just because I stopped
working on it regularly. Since the last release I managed, a major
MPIR has been released, 2.1.1, which includes support for MSVC 2010
from Dr. Brian Gladman and proper xgcd normalisation (actually done by
me) and a number of other fixes by Jason Moxham. Minh Nguyen managed
the release, and did a stellar job!

Brian Gladman has always said, MPIR *is* him, me and Jason, so without
me it's just him and Jason. But Jason recently posted to mpir-devel
with plans for the next release. So the project is not stopping.

As everyone is aware, we switched to LGPL v3+ so that code could be
exchanged with GMP devels and indeed much code from GMP 5.x.y went
into MPIR 2.0.0. Even if MPIR turns into GMP with proper MSVC Windows
support, it'll still be a success in my book. But a *lot* more is
possible.  Here are some of the "secret" plans I had for the future:

Multiplication:

* We currently don't have Toom 6 (GMP does). Someone could pull this
code from GMP and put it in MPIR. It is largely written by Marco
Bodrato, using a new "mixing" idea, which we had not thought of in
MPIR. So it is very fast and well coded. Marco Bodrato and Alberto
Zanoni are the leading experts on Toom related stuff.

* We do have Toom 8 (also largely by Marco), but what should come
next? Perhaps Toom 12? Perhaps a general Toom 2n? This is research
mathematics. Currently such a thing doesn't exist. You could write a
paper on it. Do please talk to Marco, who is very friendly and
approachable and probably has lots of good ideas about what is
possible. There's an open question as to what the actual asymptotics
of a general Toom 2n is. The best known result is due to Knuth in
TAOCP.

* FFT - I have written a completely new FFT (I did this last
Christmas). Apart from not having the sqrt2 trick implemented, this is
actually almost ready to be put into MPIR. It should be *much* better
than the FFT currently there, and I even have some new ideas which I
could discuss with anyone who would like to do even better. I will be
giving a talk at the upcoming Sage Days in Leiden on the FFT. If you
get inspired, talk to me and I'll explain my new ideas. __I will *not*
have time to complete this FFT___, but wrote extensive notes on what
needs to be done to finish it, and there's a thousand lines of code or
so ready for someone to work on if they so pleased. This new code is
currently not used by anything! This is a project absolutely *anyone*
could work on.

Division

* Anyone who's checked will notice MPIR's division code is actually
slower than GMP's. This is fully intentional. It is because I decided
we should use David Harvey's divide and conquer code (just a quick
note - he disclaims responsibility for bugs in our code - we used his
pre-implementation and adapted it for MPIR and he bears no
responsibility for the ways in which we massacred it). Now
theoretically divide and conquer code based on middle product should
be faster than that based on straight multiplication (as currently
used by GMP say). But this boils down to as much work being put into
middle product code as into ordinary multiplication code, which is
currently not the case. Specifically David Harvey invented this
fantastic integer middle product algorithm (which was completely new).
We've got an implementation largely by him (some by some of the GMP
devels) which has assembly code for a basecase, then there is the
equivalent of karatsuba multiplication for middle product (it's called
toom42_mulmid). Actually you can "invert" any ordinary multiplication
algorithm and turn it into an algorithm for middle product. This uses
a type of inversion which is called Tellegen's principal (I can
provide references, though they are also contained in David Harvey's
paper on his website). Now, the next logical step is to implement
Toom63_mulmid. This is an inverted Toom3 algorithm. In fact I have
derived the correct identity for doing this (it took me ages as I did
it by hand ;-)). Warning: David said he once gave this a go and it
wasn't faster. However, he still left the door open for it to be
successful because without proper assembly code for some of the
components, it simply *can't* compete with the Toom3 multiplication in
MPIR. But done right, it could in theory speed up division in MPIR by
a *factor of 2*!!! Again, you might even get a small paper out of this
work!

* For asymptotically fast division, we use a precomputed inverse, i.e.
to get f/g we first compute 1/g using Newton iteration, then we
essentially multiply by f (that's a slight simplification, but not
much). Now this can be made *much* faster too. Basically, Newton
iteration can be done in something like 5/3*M(n) where M(n) is the
time for multiplication (it's unknown whether 4/3 is possible for
integer inversion). With FFT precaching you can combine this with the
multiplication and get the whole division down to probably something
like 2M(n) (might be slightly higher or lower - didn't look it up for
a while). We are probably currently around 3M(n) or something of that
order. Obviously this depends on the new FFT being put in, but in the
end, we can speed up MPIR's asymptotically fast division (the concept
for which I implemented - making reuse of some of the divide and
conquer code from GMP for "boilerplate" stuff). In fact, it can be
sped up a *lot*. Really a lot!

GCD/XGCD

* The current GCD/XGCD code in MPIR could do with a *lot* of cleaning
up and general optimisation. GMP's is currently faster. It's all based
on Niel's Moller's implementation of his ngcd algorithm, which is the
best known. But I did it messily in MPIR.

* I also have an idea for a new GCD algorithm for very small integers
(a few limbs and less). I've implemented it for single limbs and it is
way faster than what GMP uses. But it can be extended. I'd be happy to
explain this to anyone who'd like to work on it. Again there *might*
be some kind of paper in it. Maybe. Not sure though. (Just cause it's
faster doesn't always get you a paper.)

Jacobi Symbol

* Richard Brent and Paul Zimmermann recently developed a new algorithm
for computing Jacobi symbol asymptotically fast. They've implemented
it and I think a future GMP will have an implementation. It's
subquadratic with the same complexity as Schoenhage's continued
fraction algorithm. It is not too hard to find a copy of the slides
Paul and Richard will use to announce it at an upcoming meeting.
There's also a paper floating around somewhere.... But this algorithm
is based on Stehle and Zimmermann's half gcd algorithm. It is not at
all clear to me why one couldn't derive a similar algorithm based on
Niel's Moller's ngcd algorithm (if Paul is around, perhaps he can
comment) at least his slides didn't rule this out. *If* such a thing
would work, I believe it should be faster. Again, you *might* get a
paper out of this. Actually, I don't see why not, so long as a
comparable implementation beats Paul and Richard's implementation of
their algorithm. As GMP will have the Brent-Zimmermann algorithm, it
seems logical that MPIR could have the other.

By the way, I want to draw attention to a new algorithm of Niels
Moller and Torbjorn Granlund for "Improved division by invariant
integers". MPIR already uses this algorithm as it was implemented in
GMP and we now use that code for basecase division in MPIR. It's an
interesting improvement and amazingly, if I understand correctly, they
actually beat the chip for 2 by 1 limb division!! (flint2 uses this
new algorithm extensively in the new nmod_poly module).

Powm

* MPIR is woefully slow at this. I vaguely recall some new REDD
algorithm mentioned on the GMP website for this. I have no idea what
this is about, or if it is even implemented, but a *pure guess* is it
might be some generalised REDC algorithm, perhaps based on a similar
trick to the divide by 3 algorithm that David Harvey and I
(re)invented some years ago when working on FLINT. I reckon the
likelihood is high (based on the name) that this is what it is, in
which case, let's put it in MPIR. If not, well then there's something
to study....

BSDNT

* Many, many people have supported the idea of writing a BSD licensed
Bignum library. Most recently Ondre Certik wrote to me *strongly*
supporting the idea. This is complex at best, because it must be so
different from GMP as to cause *absolutely no question whatsoever*
that it is independent, and thus must be implemented completely from
scratch. One idea I had was to write all the low level routines in
LLVM instead of assembly, then just build suitable optimisations into
LLVM so that nice assembly is automatically produced. This is entirely
feasible and would mean only one lot of low level routines would ever
need to be written, rather than a different set for each platform
(LLVM supports *lots* of platforms, even the playstation SPU!!).
However, LLVM doesn't support carry propagation well (as it is written
with the C front end in mind and C doesn't need to support carry
propagation). So I'm still umming and aahhing about whether this might
be more work than just writing lots of assembly code from scratch. At
this point I don't know. And yes it matters, it makes at least a
factor of 2 difference in times, which we can't just "live with" for a
bignum library.

* Antony Vennard was working hard on a website for a bsdnt project.
Bits and pieces of it function already (it's high tech). If anyone
knows Django and would love to help Antony, let us know - lots is done
but more needs to be done. I don't see a project as going anywhere
without a website....

* I wrote some code for multiplication which is relatively quick up to
a few hundred limbs, I also wrote some basecase division, but want to
change how it works because I have a better way of doing it now. All
this code is in very few lines, but still relies on assembly code in
MPIR to be efficient. We obviously can't have a BSD licensed library
relying on an LGPL library. Doesn't make sense...

* I just want to emphasise, bsdnt has *nothing* to do with MPIR. It's
a totally separate project and it came about because lots of people
kept telling us they really wanted a BSD licensed bignum library. Its
main goal is clean, simple code which anyone can read. Each function
will be only a few lines. Performance will come second, (although
there is a goal to at least be reasonable in comparison with MPIR, say
within a factor of 2 in the first year and within 20% in the second -
clock didn't start yet ;-)). Performance is *very* difficult to
achieve if you specifically do *not* want to replicate any portion of
GMP/MPIR, so there's no intention to ever beat MPIR. Actually, I don't
think that is even possible. They represent the state-of-the art by
some orders of magnitude.

Don't forget, I'll be around at Sage Days Leiden for anyone who is
going to be there. But also feel free to contact me on list here about
anything I've discussed.

Two more posts to go....

Bill.

-- 
To post to this group, send an email to sage-devel@googlegroups.com
To unsubscribe from this group, send an email to 
sage-devel+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org

[sage-devel] MPIR secrets and bsdnt

Reply via email to