I would be very suprised if you achieve faster results threading this problem. There's been much discussed on benefits or lack thereof to threading in Python (or in general).
Threading is best used in situations where you are doing different kinds of tasks. For example if you want to do your matrix multiplication WHILE you were doing other things on your computer where matrix multiplication was a background process chugging away when you are not taxing the computer doing other stuff. Threading adds efficiency when you would have lots of "blocking" operations like reading in lots of big files from a comparable slow hard drive (compared to how fast a CPU processes data) or waiting on netword data (super slow compared to CPU processing). When you are doing mass numeric processing, you want to minimize the jumping from one function to another which uses overhead, recursion adds a small amount of uneccessary overhead, you want to minimize the need for the cpu to switch between threads or processes. If you still feel the need to use threads for some reason, for numeric processing I'd recommend using a "lighter" thread object, like a tiny thread or green thread or a threadlet or whatever they are calling them now. Another thing to note is it seems you might be expecting threads to run on different CPU cores expecting improvment. Be careful with this assumption. This is not always true. It is up to the CPU and OS to determine how threads are handled and perhaps the GIL to some extent. Beaware that Python has a GIL (some distributions). Google it if you don't know of it. To encourage better use of multi-core cpus you might consider the multiprocessing library included in Python 2.7 (I think) and above. I'm assuming that speed is an issue because you where timing your code. If you are doing actual serious number crunching there's lots of advice on this. The python Numpy package as well as Stackless Python (for microthreads or whatever thier called) comes to mind. Another thought. Ask yourself if you need a large in-memory or live set of processed numbers, in your case a fully and processed multiplied matrix. Usually a large set of in-memory numbers is something your going to use to simulate a model or to process and crunch further. Or is your actual usage going to be picking out a processed number here or there from the matrix. If this is true look at iterators or generators. Which would be a snapshot in time of your matrix multiplication. I like to think of Python generators like integral calculus (definition at: http://en.wikipedia.org/wiki/Integral_calculus) where the specific integral of a generator is often just 1. I'm loving generators a lot. For example there are generator accelorators which if you think it through means you can make generator deccelorators, useful for doing interpolation between elements of your matrix elements for example. I always forget if generators are thread safe though. Some indicators that generators could help: You're doing lots of for loops with range(). Also it's been measured that list comprehensions are slightly faster then while loops are a slightly faster then for loops. You can Google to confirm, enter something like "python fast iteration". Also if your numbers in your matix are actually not really numbers but objects with numbers, __slots__ is used to for large sets of objects (10s of millions at the very least) to minimize memory usage and perhaps with speed, if used properly. Just mentioning. I'd stay away from this though. Some of my informatation may be inaccurate (and even completely wrong; like I always get when a thread is best switched during a blocking verse a non-blocking operation) but there are some things to consider. -- http://mail.python.org/mailman/listinfo/python-list