> Date: Sat, 1 Jul 2023 15:11:56 -0000 (UTC) > From: mlel...@serpens.de (Michael van Elst) > > crt0 pulls in > - atexit > - environment > - static TLS > - stack guard > > which all more or less pull in jemalloc, stdio and string functions. > > You need to replace these with dummies (compile with -fno-common) > and of course, your program must not make use of the functionality...
A quicker way to address most of it is to just define your own malloc: $ cat null.o #include <stddef.h> void *malloc(size_t n) { return NULL; } void *realloc(void *p, size_t n) { return NULL; } void *calloc(size_t n, size_t sz) { return NULL; } void free(void *p) {} int main(void) { return 0; } $ cc -g -O2 -static -o null null.c $ size null text data bss dec hex filename 26724 3208 3184 33116 815c null This still has printf, rbtree, string, atomic, &c., but not jemalloc, giving a ~20x size reduction from half a megabyte to 25 KB or so. If someone really wants to do the work to reduce the overhead without providing an alternative malloc, or reduce more than you get with an alternative malloc, here are some avenues that might be worth pursuing without incurring too much overhead: > int atexit(void) { return 0; }; The runtime startup logic, csu, relies on atexit. But perhaps csu could use an internal __atexit that reserves 4 or 5 static slots, and the libc atexit uses the last one to call handlers in slots that are dynamically allocated by malloc. As long as your program doesn't call atexit, this only uses a fixed amount of space from csu and won't bring in malloc. > char *__allocenvvar() { return 0; }; > bool __canoverwriteenvvar() { return true; }; > size_t __envvarnamelen() { return 0; }; > void *__findenv() { return 0; }; > void *__findenvvar() { return 0; }; > void __freeenvvar() { }; > ssize_t __getenvslot() { return 0; }; > void __libc_env_init() { }; > bool __readlockenv() { return true; }; > bool __unlockenv() { return true; }; > bool __writelockenv() { return false; }; Programs that use only getenv don't need any of the machinery to allocate environment slots. The logic that getenv uses could be isolated to its own .c file with no allocation. This more or less requires splitting up __getenvslot into two separate functions, one for the allocate=true case and the other for the allocate=false case, with a .h file to mediate the global state between the two .c files. __libc_env_init (which is what pulls all this in even if you don't use getenv, setenv, &c.) could perhaps be a weak symbol with a strong alias in the .c file that does allocation and modification. > void __libc_rtld_tls_allocate() { }; > void __libc_rtld_tls_free() { }; > void __libc_static_tls_setup() { }; > void __libc_tls_get_addr() { }; I'm stumped about this one. In principle the linker has enough information to decide whether __libc_static_tls_setup is needed (which is what, in _libc_init, pulls all this in), but in practice I don't know of any path that would let us conditionalize its use on whether the object has any static TLS relocations. Maybe rtld could be responsible for mmapping the initial thread's static TLS space so libc is not but I'm not sure if that will work without a lot of effort. > void __chk_fail() { }; > void __guard_setup() { }; > void __stack_chk_fail() { }; > void __stack_chk_fail_local() { }; > int __stack_chk_guard; This calls syslog_ss, which brings in xsyslog.c. Not sure if that brings in malloc or anything exciting beyond vsnprintf_ss (which itself shouldn't malloc or be exciting, since it has to be async-signal-safe). But if it does, maybe the call in stack_protector.c __fail to syslog_ss could be defined in terms of some weak symbol __stack_chk_log which is defined by xsyslog.c using syslog machinery, with a fallback to write to STDERR_FILENO; that way it only even tries to use syslog if anything else in the program already uses syslog. (But I'm not going to do this work, and I'm not sure if there's going to be a good way to kick malloc out of the static TLS business without toolchain and/or rtld support.)