I've been playing with performance hacks, mostly for fun. I have a mips16 patch which seems to work, but running microbenchmarks in lua isn't very interesting. What's a good macrobenchmark reflecting what people actually care about? Rebuilding a ppp config? luci status page load speed?
==== My ar71xx boxes have a 16-entry TLB. My intuition is that this is getting blown through rapidly. IIRC, on mips4k-style boxes each TLB entry covers two pages, i.e., 8k. So let's look at where that goes in /usr/bin/lua. Main text segment: 1. This is the main program proper, rodata, and the PLT. Stays under 8k...barely. Main writable data segment: 1. Includes writable data and the GOT. Less than a page in this case. Heap: 3 after startup. (24k mapped for the heap.) Presumably all hot. libc text: call it 3 hot TLB entries. ~40 cover the whole library. libc writable data: call it 2 hot TLB entries, because the GOT lives here too. libgcc_s: text is warm because of softfloat; call it 1. Now we're at 13. liblua.so will surely eat a few more.... On the 32M/64M machines switching to 64k pages starts to seem worthwhile. Has anybody tried this? With 64k pages, merging shared libraries into fewer objects to reduce the number of mmaps/translations seems like it might be beneficial. More radically: although it doesn't look like hugetlbfs is working on 32-bit kernels, there are alternatives for the less-dynamic embedded machines. We could carve off a 2M chunk of memory and make it present in every process address space, like a jumbo vdso. Major readonly components like libc.text and busybox.text could go there without significant surgery; their GOTs and other writable data would still have per-process mappings, at a fixed distance from the jumbo vdso.... Jay _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel