Our backends have done roughly the same good/bad job of generating code since, well, forever. Until that changes, you will see that the only performance improvements you get in the general case are things that the backend was too dumb to get in the first place (IE loads, stores)
I don't think that's fair. There are many things that can be done at higher level that could never be done in a backend, such as the very fancy tail recursion we do and some loop optimizations. One can construct quite amazing test cases for them. What's disappointing to me is that it seems they don't trigger nearly as much in real code as one would like to see.