I am rather skeptical about the usefulness of such microoptimizations without influence on algorithmic complexity. In return for better memory locality you buy into quite larger memory fragmentation, and we have scores of comparatively modest size already exhausting memory. All that exhausted memory needs to get filled and processed, so it would rather seem like the true savings are not to be found in doing the same kind of work in a slightly faster but less maintainable manner with hand-written optimisations, but rather in figuring out why too much work is being done in the first place.
The more one replaces standard tools and operations, the harder it becomes to figure out what kind of stuff actually goes wrong and fix it, or change the strategies and algorithms wholesale. https://codereview.appspot.com/583750043/