There was some discussion a few weeks ago about some apps running slower with FDO enabled.
I've recently investigated a similar situation using mainline. In my case, the fact that the loop_optimize pass is disabled during FDO was the cause of the slowdown. It appears that was recently disabled as part of Jan Hubicka's patch to eliminate RTL based profiling. The commentary indicates that the old loop optimizer is incompatible with tree profiling. While this doesn't explain all of the degradations discussed (some were showing up on older versions of the compiler), it may explain some. Pete