So I've talked to one or two CS people I know, and asked about the issues David brought up in his second talk at SD2. As far as speeding up arithmetic with integers, there's one "known" trick that people use, namely switching between boxed & unboxed integers.

Boxed & Unboxed integers: While the terminology isn't 100% standard, it's very descriptive. The basic question is this: when you're storing something like an integer, do you want to store the actual value, or a pointer to the value? There are obvious tradeoffs in both directions: storing an actual value is faster, but if you want to support sufficiently large numbers, you need to have another datatype. So storing a pointer to the value is called a "boxed" integer, and storing the value itself is "unboxed."

As I understand it, SAGE integers are just wrapped GMP integers. Now, I've looked through the GMP documentation, and their integers are all boxed. They have full support for a small integer type (i.e. fits in a machine word), but they don't use it by default. They point out that it's a library, so I think this makes perfect sense for a library -- after all, if we do our own boxing & unboxing, that should go faster than passing off to GMP to do theirs. In this sense, it makes sense that they provide various interfaces, and let us do what we want.

What this means for us: if we box our integers, we stand to make some speed increases on small integer arithmetic (i.e. anything that fits in a machine word). Of course, any time you overflow word size, you have to switch representations, which will slowdown certain things. Of course, for mod_n arithmetic, where n is smaller than word size, this seems nice and speedy, since our ints will then be C ints. There are a few tricks to make this go a bit more smoothly; I'll describe the one I like most. (I believe it's what Chez Scheme does, but I'm not 100% sure.) You have the one bit -- either highest or lowest order -- of the integer itself be your flag. So when you want to do something with an integer, you test that bit (a bit operation, so fast), and if it's, say, 0, you know you've got an unboxed integer in your hands. If it's 1, you know it's a pointer to a GMP integer. This is a pretty slick system -- apparently lots of compilers use this trick to speed up operations on 3 or 7 specific types they want fast operations on, and then leave the last bit to say "it's more complicated, do the generic thing." For instance, when you're just adding integers, with the top bit being your flag, 0 for unboxed, 1 for boxed, if you add two unboxed things, and you check that bit again, it determines whether or not you've got an overflow and need to allocate a new GMP integer to start working with. Chez allows operations directly on unboxed integers (these are called things like fx+ and fx*, for fixnum), which the user uses (at his or her own risk) to speed up certain code in their own inner loops; basically, it's an add without typechecks, so that you can make things faster when you know it's safe to do so.

Also, GMP has its own unboxed integer datatype, but since we're using Pyrex and generating C code ourselves, I don't see any reason to use theirs instead of just rolling our own in C.

Maybe there's an obvious reason I'm missing to not do this, but it's a known answer to this problem.

-cc


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://sage.scipy.org/sage/ and http://modular.math.washington.edu/sage/
-~----------~----~----~----~------~----~------~--~---

Reply via email to