the trouble with the zeroing pages scheme, which has been done before (eg, having the process with least priority top up the memory), is that it was great when the system wasn't under load, just fine under increased load, but eventually, when you most want your cycles, you're back in the original situation. if you were able to hold usage below that level, it still worked against keeping more useful data cached in that memory. (in effect, you had a cache of zeroes.) in a similar way, there might be more useful things to do with your DMA cycles, which cost something too, and there are implications for power.
the maths behind Working Set hasn't gone away just because the parameters got bigger. otherwise, it would never have been true.