G. Branden Robinson writes: > Are there people actively working on this? The mood I was getting was > that G3's weren't work the trouble. I myself don't have much in the way > of PowerPC assembly brains, or video processing mojo.
Some basic tips: Sketch out data flow on a piece of paper. This helps with register allocation and instruction ordering. Instructions depend on each other in a tree-like way; draw arrows labeled with the dependency. The dependency might be "memory", "r[]" (where "[]" represents an unallocated register which you will fill in later), CR, CTR, LR... Then try to interleave the independent streams of instructions as much as possible. This helps you to avoid stalls caused by instructions waiting for others to complete. Also try to mix up your use of the different (int, FP, branch, mem) functional units. Watch out for r0 in load/store instructions. Get comfortable with the rlwimi and similar instructions. With these you can rotate, mask, and copy bits. Don't be too afraid of floating-point. There is a fused multiply-add instruction (A=x*y+z) that is great for the matrix operations commonly found in video processing. You can do at least one operation per cycle, in parallel with the integer pipeline, with a latency of 4 or 5 cycles. Take advantage of the cache control instructions. They let you prefetch, flush, zero, discard... Stick to the usual guidelines for cache-aware programming as well, keeping your footprint small and so on. BTW, procmail can filter out duplicate messages.