On 11/22/2013 10:30 AM, Eric Anholt wrote:
Kenneth Graunke <kenn...@whitecape.org> writes:

On 11/22/2013 12:21 AM, Eric Anholt wrote:
The canary is basically just to give a better debugging message when you
ralloc_free() something that wasn't rallocated.  Reduces maximum memory
usage of apitrace replay of the dota2 demo by 60MB on my 64-bit system (so
half that on a real 32-bit dota2 environment).

Really, half?  It's an unsigned...that's 4 bytes regardless of 64-bit
vs. 32-bit.  I think this should be 60MB of savings, end of story.

Scalar types get aligned to their size, so since it's followed by a
pointer, there's 4 bytes of pad in between.

For anyone that hasn't seen this tool before, check out pahole from the
dwarves package.  Run it on a .o file you think might be sucking up a
bunch of memory, and see your structs like:

class fs_inst : public backend_instruction {
public:

         /* class backend_instruction <ancestor>; */      /*     0    32 */

         /* XXX last struct has 7 bytes of padding */

         class fs_reg              dst;                   /*    32    48 */
         /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
         class fs_reg              src[3];                /*    80   144 */
         /* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
         bool                       saturate;             /*   224     1 */

         /* XXX 3 bytes hole, try to pack */

         int                        conditional_mod;      /*   228     4 */
         uint8_t                    flag_subreg;          /*   232     1 */

         /* XXX 3 bytes hole, try to pack */

         int                        mlen;                 /*   236     4 */
         int                        regs_written;         /*   240     4 */
         int                        base_mrf;             /*   244     4 */
         uint32_t                   texture_offset;       /*   248     4 */
         int                        sampler;              /*   252     4 */
         /* --- cacheline 4 boundary (256 bytes) --- */
         int                        target;               /*   256     4 */
         bool                       eot;                  /*   260     1 */
         bool                       header_present;       /*   261     1 */
         bool                       shadow_compare;       /*   262     1 */
         bool                       force_uncompressed;   /*   263     1 */
         bool                       force_sechalf;        /*   264     1 */
         bool                       force_writemask_all;  /*   265     1 */

...

         /* size: 288, cachelines: 5, members: 21 */
         /* sum members: 280, holes: 3, sum holes: 8 */
         /* paddings: 1, sum paddings: 7 */
         /* last cacheline: 32 bytes */
};


Getting a bit OT, but I'm sure some mesa structs could be compacted quite a bit. In gl_texture_image, for example, a number of the fields could be reduced to GLubyte (like Face, Level, Border, NumSamples, etc) and rearranged to reduce the memory used for such objects.

We could potentially reduce gl_texture_image from 80 bytes to 44 bytes which would save 324 bytes for a 256x256 mipmapped texture. It would start to add up with a thousand textures or so.

There might be some debate about how worthwhile that is. I'm not too concerned right now.

However, pahole says gl_debug_state is fairly huge: 292712 bytes! sizeof(gl_context) = 384208 so that's a big piece. At the very least, maybe gl_debug_state could be pulled out and allocated on first use...

-Brian

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to