XMPP starts with X, so it sucks, and SIP also has it's complexities if you want...
I'm sceptical about Tor's latency, I'd generally look for direct P2P connections for the voice stream without any third server in the middle. Do you know of any skype-competition that correctly implements 2-way UDP hole-punching? SIP+STUN doesn't seem to fix the problem for me and other extensions to make it work better (like ICE) weren't supported by the implementations I tried. I don't even know what they do and gave up on that topic long time ago. I would still want the application to fall back to using a proxy if all hole-punching attempts failed. And what sadly is not obvious to the implementers: I would want the applications to DETECT that it failed, i.e. when there's no rtp packet for a second you send out some standard SIP event: INFO fuck-why-can't-you-say-something and then retry with proper rtp. Also don't forget ossrecord | nc, and ossplay for the other direction.