* Peter Zijlstra <pet...@infradead.org> wrote: > Currently __module_address() is using a linear search through all > modules in order to find the module corresponding to the provided > address. With a lot of modules this can take a lot of time. > > One of the users of this is kernel_text_address() which is employed > in many stack unwinders; which in turn are used by perf-callchain > and ftrace (possibly from NMI context). > > So by optimizing __module_address() we optimize many stack unwinders > which are used by both perf and tracing in performance sensitive > code.
So my (rather typical) workstation has 116 modules loaded currently - but setups using in excess of 150 modules are not uncommon either. A linear list walk of 100-150 entries for every single call chain entry that hits some module, in 'perf record -g', can cause some overhead! > + /* > + * If this is non-NULL, vfree after init() returns. s/vfree/vfree() > + /* > + * We want mtn_core::{mod,node[0]} to be in the same cacheline as the > + * above entries such that a regular lookup will only touch the one > + * cacheline. s/touch the one cacheline /touch one cacheline ? > +static __always_inline int > +mod_tree_comp(void *key, struct latch_tree_node *n) > +{ > + unsigned long val = (unsigned long)key; > + unsigned long start, end; > + > + end = start = __mod_tree_val(n); > + end += __mod_tree_size(n); > + > + if (val < start) > + return -1; > + > + if (val >= end) > + return 1; > + > + return 0; So since we are counting nanoseconds, I suspect this could be written more optimally as: { unsigned long val = (unsigned long)key; unsigned long start, end; start = __mod_tree_val(n); if (val < start) return -1; end = start + __mod_tree_size(n); if (val >= end) return 1; return 0; } right? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/