Probably you are seeing some interrupt mitigation.
It seems there is a difference in how the interrupt mitigation is programmed on for 8168 chips vs. others by default. Most get all zeros in the IntrMitigate register, whilst for 8168 chips a value of 0x5151 is programmed. You can play with ethtool to mess with the coalescing settings to see if this is part of the problem. I bet this might explain the behavior you see after including even Heiner's TXCFG_AUTO_FIFO patch.