I've been playing with performance hacks, mostly for fun. I have a mips16 patch 
which seems to work, but running microbenchmarks in lua isn't very interesting. 
What's a good macrobenchmark reflecting what people actually care about? 
Rebuilding a ppp config? luci status page load speed?

====

My ar71xx boxes have a 16-entry TLB. My intuition is that this is getting blown 
through rapidly. IIRC, on mips4k-style boxes each TLB entry covers two pages, 
i.e., 8k. So let's look at where that goes in /usr/bin/lua.

Main text segment: 1. This is the main program proper, rodata, and the PLT.  
Stays under 8k...barely.

Main writable data segment: 1. Includes writable data and the GOT. Less than a 
page in this case.

Heap: 3 after startup. (24k mapped for the heap.) Presumably all hot.

libc text: call it 3 hot TLB entries. ~40 cover the whole library.

libc writable data: call it 2 hot TLB entries, because the GOT lives here too.

libgcc_s: text is warm because of softfloat; call it 1.

Now we're at 13. liblua.so will surely eat a few more....

On the 32M/64M machines switching to 64k pages starts to seem worthwhile. Has 
anybody tried this? 

With 64k pages, merging shared libraries into fewer objects to reduce the 
number of mmaps/translations seems like it might be beneficial.

More radically: although it doesn't look like hugetlbfs is working on 32-bit 
kernels, there are alternatives for the less-dynamic embedded machines. We 
could carve off a 2M chunk of memory and make it present in every process 
address space, like a jumbo vdso. Major readonly components like libc.text and 
busybox.text could go there without significant surgery; their GOTs and other 
writable data would still have per-process mappings, at a fixed distance from 
the jumbo vdso....

Jay
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Reply via email to