Re: [Wormux-dev] Very good performance improvment in Wormux

Jean-Christophe Duberga Sun, 20 Nov 2005 12:22:56 +0100

Victor STINNER wrote:

Hey,


(Very long email, if you don't have time to read it, go to "PROPOSITION"
at the end)

I would like to improve Wormux speed, because like a friend said to me:
"Worms 1 ran on a 486 [ and Wormux need 2 GHz ]" !!! For sure, Wormux
need too much CPU and memory.

Agree, but the way we have choosen to write wormux make it impossible tobe runned on so old hardware :

- We render on the fly TTF fonts ( use a lot of CPU )

- We display the ground and sprites with alpha : that prevents simple,fast, memcpy... copies but copy pixel per pixel and oblige computationsthat take care of every R G B components of the source and destinationpixel along the A component of the source pixel. ( that is done with theCPU when use SDL when the sprites are in main memmory RAM)- The resolution is not the same (I give me a 50fps in 320x200 but aswormux is CPU bounded and that in 320x200 wormux use only 22 %,, byremoving a thing that limit FPS It would run at 100/22 * 50 = 227 fps )

- The memories copies

- in indexed 256 colors graphics surfaces takes 1/4 of the momory of ourgraphic surfcaces made of 4 bytes per pixels- We have no access (under linux) to video memmory... you can do thatexporting the env. variable SDL_VIDEODRIVER as "dga" but that obligeyou to run wormux as root ! So that it oblige us to do draw everythingin video memmory than to copy it to video memmory ( throught theSDL_Flip function call)

....

For theese (and others) reasons wormux can't be palyable on i486 stuffbut I agree we can do better and we must do better.

Note: For the SDL_SetVideoMode call with the bpp set as 16, I havealready telling that 2 monthes ago :

https://mail.gna.org/public/wormux-dev/2005-09/msg00014.html
with the explaination of why we don't fix the bpp to 16 bits per pixels.

So I wrote a tool to benchmark Wormux: view_stat.py, written in Python.
You have to add StatStart() and StatStop() functions in the code, run
Wormux, and then read result using this tools. Example:
StatStart("Draw:sky");
sky.Draw();
StatStop("Draw:sky");
...
StatStart("Draw:map");
map.Draw();
StatStop("Draw:map");
...

It would be possible to disable stats (so, it doesn't impact of
performance), I prepared #define, but doesn't work with ./configure yet.

So, most important result: in game loop, Draw() eat 99% of CPU and
Refresh() 1% !!! WTF !? It means that *all* CPU is used only to display.

Yes, that is why we should improve that before others things (spritescollisions per exemple is not a priority today)

I tried to improve Draw() function. Most interesting point: only draw
sky+ground once (or when the camera moved) make Wormux really faster.
- On my computer : 21 fps => with the patch (*) : 75 fps !!!
- On a friend computer : 40 => 120 fps !!!

Ok, that is really faster when you don't move but when you move thecamera, the FPS fall down as before... not?

But that is intersting for exemple to reduce the CPU consumption of wormux.

Ok, if you are intersted, I can write a map method to redraw the map(sky / ground ) of a rectangle area and not all the screen then with asuch method you can apply your patch ?

So I "think" that we have to work around ... no ? :) My patch can't be
applied because when the sky+ground isn't refreshed, objects aren't
removed when they move ...

Ok tell me if you wan't me to write this method in the map object

(*) "The patch" => in map/ground.cpp and map/sky.cpp, replace "#if 0"
with "#if 1" (in Draw() functions).

--- PROPOSITION ---

Idea: I propose to write a cache for the whole screen. We just need one
big cache which would know "what have to be draw". The game would not
ever ask object to draw them, but ask the cache to draw objects which
"have moved" (or if another object is removed/moved at same position).

The cache would be a list of rectangles where the screen has to be
redrawn.

Start of game and when the camera move:
 cache.Invalidate(<whole screen>)

When an object move:
 cache.Invalidate(<old rectangle>)
 cache.Invalidate(<new rectangle>)

When an object is removed:
 cache.Invalidate(<old rectangle>)

(...)

The cache have to manage a list of rectangle and do intersection of two
rectangles. Example:

+---+  +---+
| A |  | B |
+---+  +---+

=> can be stored as (A, B) or

+----------+
|  big A+B |
+----------+

We have to test if it's faster to draw A and then draw B (which means
call Draw() function of each object twice), or just draw a bigger
rectangle.

Oh oh... with our software graphic surface alpha blending functions,that would be always faster to draw A and B as drawing the big A+B (ok,that is not true when A and B measure each 1pixel and are adjacent butfor our surfaces that will be always true). Why ? the alpha blittingfunctions have a small overhead and are very dependents of the numberof blended pixels .

Draw function of a sprite/object would become:
 void Object::Draw(Rectangle draw)
 {
    Rectangle intersection = Intersection(this.rect, draw);
    if (intersection.isEmpty()) return;
    Blit(surface, intersection, ...);
 }

What do you think about the cache? I think that I would be "easy" to
write the cache, but very difficult to "upgrade" all Draw()
functions ...

Not so difficult, we can do it
But, What thinks of that the others ?

I think we should see this method as a means to save CPU consumption andnot to improve FPS anyway I find that a valuable task to make.


a+
Jean-Christophe

--
Jean-Christophe Duberga   - http://jeanchristophe.duber.free.fr
Excellente ecole pour apprendre la langue allemande à Berlin 
<http://www.dialog-sprachenschule-berlin.de/>

Re: [Wormux-dev] Very good performance improvment in Wormux

Répondre à