On a relate topic. Last time, tried build a DPDK application using -fwhole-program gcc gave lots of warnings because it decided not to inline rte_memcpy.
Perhaps this might impact LTO as well. Really rte_memcpy_func should not be inline. We already optimize for the constant size case where inline makes sense. After that not so much.