On Fri, Dec 15, 2017 at 12:03:31PM +0000, Patrick Bellasi wrote: > So, by moving util_est right after sched_avg, here is what we get (with some > lines to better highlight 64B boundaries): > > const struct sched_class * sched_class; > /* 152 8 */ > struct sched_entity { > [...] > ---[ Line 9 > ]------------------------------------------------------------------------------- > struct sched_avg { > /* typedef u64 */ long long unsigned int > last_update_time; /* 576 8 */ > /* typedef u64 */ long long unsigned int > load_sum; /* 584 8 */ > /* typedef u64 */ long long unsigned int > runnable_load_sum; /* 592 8 */ > /* typedef u32 */ unsigned int util_sum; > /* 600 4 */ > /* typedef u32 */ unsigned int > period_contrib; /* 604 4 */ > long unsigned int load_avg; > /* 608 8 */ > long unsigned int runnable_load_avg; > /* 616 8 */ > long unsigned int util_avg; > /* 624 8 */ > } avg; /* 576 56 */ > /* --- cacheline 6 boundary (384 bytes) was 24 bytes > ago --- */ > struct util_est { > long unsigned int last; > /* 632 8 */ > ---[ Line 10 > ]------------------------------------------------------------------------------ > long unsigned int ewma; > /* 640 8 */ > } util_est; /* 632 16 */ > } se; /* 192 512 */ > ---[ Line 11 > ]------------------------------------------------------------------------------ > /* --- cacheline 9 boundary (576 bytes) was 24 bytes ago --- > */ > struct sched_rt_entity { > struct list_head { > struct list_head * next; > /* 704 8 */ > struct list_head * prev; > /* 712 8 */ > } run_list; /* 704 16 */ > > > As you can see we still end up with util_est spanning acrosss two cache and > even worst with an almost empty Line 10. The point is that sched_avg already > uses 56B... which leave just 8bytes left.
Yes, that's unfortunate. > So, I can to move util_est there and use unsigned int for "last" and "ewma" > storage. This should fix the cache alignment but only until we do not add > other stuff to sched_avg. > > BTW, should not be possible to use a similar "fasting" approach for load_avg > and runnable_load_avg? Given their range a u32 should be just good enough, > isn't it? Probably, I'd have to page all that stuff back in :/ Another issue is that for tasks load and runnable_load are the exact same; I just never found a sensible way to collapse that.