Hi all, this is the first of three posts, primary audience sage-devel, esp. those attending Sage Days in Leiden.
As many of you know, I recently stepped down from the MPIR project. The main reason was that I need to focus on FLINT until about 1 Oct 2010, then need to transition to 80% research, 20% coding (my colleagues at Warwick will be aghast that this is not already the ratio - but they probably guessed as much). Now that I have stepped down from MPIR, I can divulge all the "secret" development plans I had, in the hope that some other people might get inspired. That is what this email is about - potential MPIR projects that *anyone* could work on (assuming they code in C), even research level projects!! Firstly, in my mind, MPIR is not going away just because I stopped working on it regularly. Since the last release I managed, a major MPIR has been released, 2.1.1, which includes support for MSVC 2010 from Dr. Brian Gladman and proper xgcd normalisation (actually done by me) and a number of other fixes by Jason Moxham. Minh Nguyen managed the release, and did a stellar job! Brian Gladman has always said, MPIR *is* him, me and Jason, so without me it's just him and Jason. But Jason recently posted to mpir-devel with plans for the next release. So the project is not stopping. As everyone is aware, we switched to LGPL v3+ so that code could be exchanged with GMP devels and indeed much code from GMP 5.x.y went into MPIR 2.0.0. Even if MPIR turns into GMP with proper MSVC Windows support, it'll still be a success in my book. But a *lot* more is possible. Here are some of the "secret" plans I had for the future: Multiplication: * We currently don't have Toom 6 (GMP does). Someone could pull this code from GMP and put it in MPIR. It is largely written by Marco Bodrato, using a new "mixing" idea, which we had not thought of in MPIR. So it is very fast and well coded. Marco Bodrato and Alberto Zanoni are the leading experts on Toom related stuff. * We do have Toom 8 (also largely by Marco), but what should come next? Perhaps Toom 12? Perhaps a general Toom 2n? This is research mathematics. Currently such a thing doesn't exist. You could write a paper on it. Do please talk to Marco, who is very friendly and approachable and probably has lots of good ideas about what is possible. There's an open question as to what the actual asymptotics of a general Toom 2n is. The best known result is due to Knuth in TAOCP. * FFT - I have written a completely new FFT (I did this last Christmas). Apart from not having the sqrt2 trick implemented, this is actually almost ready to be put into MPIR. It should be *much* better than the FFT currently there, and I even have some new ideas which I could discuss with anyone who would like to do even better. I will be giving a talk at the upcoming Sage Days in Leiden on the FFT. If you get inspired, talk to me and I'll explain my new ideas. __I will *not* have time to complete this FFT___, but wrote extensive notes on what needs to be done to finish it, and there's a thousand lines of code or so ready for someone to work on if they so pleased. This new code is currently not used by anything! This is a project absolutely *anyone* could work on. Division * Anyone who's checked will notice MPIR's division code is actually slower than GMP's. This is fully intentional. It is because I decided we should use David Harvey's divide and conquer code (just a quick note - he disclaims responsibility for bugs in our code - we used his pre-implementation and adapted it for MPIR and he bears no responsibility for the ways in which we massacred it). Now theoretically divide and conquer code based on middle product should be faster than that based on straight multiplication (as currently used by GMP say). But this boils down to as much work being put into middle product code as into ordinary multiplication code, which is currently not the case. Specifically David Harvey invented this fantastic integer middle product algorithm (which was completely new). We've got an implementation largely by him (some by some of the GMP devels) which has assembly code for a basecase, then there is the equivalent of karatsuba multiplication for middle product (it's called toom42_mulmid). Actually you can "invert" any ordinary multiplication algorithm and turn it into an algorithm for middle product. This uses a type of inversion which is called Tellegen's principal (I can provide references, though they are also contained in David Harvey's paper on his website). Now, the next logical step is to implement Toom63_mulmid. This is an inverted Toom3 algorithm. In fact I have derived the correct identity for doing this (it took me ages as I did it by hand ;-)). Warning: David said he once gave this a go and it wasn't faster. However, he still left the door open for it to be successful because without proper assembly code for some of the components, it simply *can't* compete with the Toom3 multiplication in MPIR. But done right, it could in theory speed up division in MPIR by a *factor of 2*!!! Again, you might even get a small paper out of this work! * For asymptotically fast division, we use a precomputed inverse, i.e. to get f/g we first compute 1/g using Newton iteration, then we essentially multiply by f (that's a slight simplification, but not much). Now this can be made *much* faster too. Basically, Newton iteration can be done in something like 5/3*M(n) where M(n) is the time for multiplication (it's unknown whether 4/3 is possible for integer inversion). With FFT precaching you can combine this with the multiplication and get the whole division down to probably something like 2M(n) (might be slightly higher or lower - didn't look it up for a while). We are probably currently around 3M(n) or something of that order. Obviously this depends on the new FFT being put in, but in the end, we can speed up MPIR's asymptotically fast division (the concept for which I implemented - making reuse of some of the divide and conquer code from GMP for "boilerplate" stuff). In fact, it can be sped up a *lot*. Really a lot! GCD/XGCD * The current GCD/XGCD code in MPIR could do with a *lot* of cleaning up and general optimisation. GMP's is currently faster. It's all based on Niel's Moller's implementation of his ngcd algorithm, which is the best known. But I did it messily in MPIR. * I also have an idea for a new GCD algorithm for very small integers (a few limbs and less). I've implemented it for single limbs and it is way faster than what GMP uses. But it can be extended. I'd be happy to explain this to anyone who'd like to work on it. Again there *might* be some kind of paper in it. Maybe. Not sure though. (Just cause it's faster doesn't always get you a paper.) Jacobi Symbol * Richard Brent and Paul Zimmermann recently developed a new algorithm for computing Jacobi symbol asymptotically fast. They've implemented it and I think a future GMP will have an implementation. It's subquadratic with the same complexity as Schoenhage's continued fraction algorithm. It is not too hard to find a copy of the slides Paul and Richard will use to announce it at an upcoming meeting. There's also a paper floating around somewhere.... But this algorithm is based on Stehle and Zimmermann's half gcd algorithm. It is not at all clear to me why one couldn't derive a similar algorithm based on Niel's Moller's ngcd algorithm (if Paul is around, perhaps he can comment) at least his slides didn't rule this out. *If* such a thing would work, I believe it should be faster. Again, you *might* get a paper out of this. Actually, I don't see why not, so long as a comparable implementation beats Paul and Richard's implementation of their algorithm. As GMP will have the Brent-Zimmermann algorithm, it seems logical that MPIR could have the other. By the way, I want to draw attention to a new algorithm of Niels Moller and Torbjorn Granlund for "Improved division by invariant integers". MPIR already uses this algorithm as it was implemented in GMP and we now use that code for basecase division in MPIR. It's an interesting improvement and amazingly, if I understand correctly, they actually beat the chip for 2 by 1 limb division!! (flint2 uses this new algorithm extensively in the new nmod_poly module). Powm * MPIR is woefully slow at this. I vaguely recall some new REDD algorithm mentioned on the GMP website for this. I have no idea what this is about, or if it is even implemented, but a *pure guess* is it might be some generalised REDC algorithm, perhaps based on a similar trick to the divide by 3 algorithm that David Harvey and I (re)invented some years ago when working on FLINT. I reckon the likelihood is high (based on the name) that this is what it is, in which case, let's put it in MPIR. If not, well then there's something to study.... BSDNT * Many, many people have supported the idea of writing a BSD licensed Bignum library. Most recently Ondre Certik wrote to me *strongly* supporting the idea. This is complex at best, because it must be so different from GMP as to cause *absolutely no question whatsoever* that it is independent, and thus must be implemented completely from scratch. One idea I had was to write all the low level routines in LLVM instead of assembly, then just build suitable optimisations into LLVM so that nice assembly is automatically produced. This is entirely feasible and would mean only one lot of low level routines would ever need to be written, rather than a different set for each platform (LLVM supports *lots* of platforms, even the playstation SPU!!). However, LLVM doesn't support carry propagation well (as it is written with the C front end in mind and C doesn't need to support carry propagation). So I'm still umming and aahhing about whether this might be more work than just writing lots of assembly code from scratch. At this point I don't know. And yes it matters, it makes at least a factor of 2 difference in times, which we can't just "live with" for a bignum library. * Antony Vennard was working hard on a website for a bsdnt project. Bits and pieces of it function already (it's high tech). If anyone knows Django and would love to help Antony, let us know - lots is done but more needs to be done. I don't see a project as going anywhere without a website.... * I wrote some code for multiplication which is relatively quick up to a few hundred limbs, I also wrote some basecase division, but want to change how it works because I have a better way of doing it now. All this code is in very few lines, but still relies on assembly code in MPIR to be efficient. We obviously can't have a BSD licensed library relying on an LGPL library. Doesn't make sense... * I just want to emphasise, bsdnt has *nothing* to do with MPIR. It's a totally separate project and it came about because lots of people kept telling us they really wanted a BSD licensed bignum library. Its main goal is clean, simple code which anyone can read. Each function will be only a few lines. Performance will come second, (although there is a goal to at least be reasonable in comparison with MPIR, say within a factor of 2 in the first year and within 20% in the second - clock didn't start yet ;-)). Performance is *very* difficult to achieve if you specifically do *not* want to replicate any portion of GMP/MPIR, so there's no intention to ever beat MPIR. Actually, I don't think that is even possible. They represent the state-of-the art by some orders of magnitude. Don't forget, I'll be around at Sage Days Leiden for anyone who is going to be there. But also feel free to contact me on list here about anything I've discussed. Two more posts to go.... Bill. -- To post to this group, send an email to sage-devel@googlegroups.com To unsubscribe from this group, send an email to sage-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/sage-devel URL: http://www.sagemath.org