Your blog post says uploads to GPU are slow, but I think that's
incorrect (or at least incomplete). I think the bottle neck is the
conversion from BitmapData to the texture format (
/texture.uploadFromBitmapData())/. If you use
texture.uploadFromByteArray you'll see much faster throughputs. Which
suggest, the bottleneck is not with stage3d and texture uploads
generally, but with uploads from BitmapData (which is what the old
display list produces).
You may even be able to handle and cache some of that BMD -> BGRA
conversion yourself in MadComponents and gain a nice performance boost
by using uploadFromByteArray() (at least when you can bypass redraw from
the display list, which as I've said many times, is not well suited for
a GPU workflow).
More:
http://jacksondunstan.com/articles/1617
Kevin N.