A rundown of the events in this scenario, with a little more technical details:
1. Fast xterm gets the request for pasting text (user presses the middle mouse button). It processes this and eventually calls XConvertSelection. In X terminology, this is called requesting a selection. 2. Server sends a SelectionRequest event and the slow xterm receives this. As a reply, it sends a SelectionNotify event to the fast xterm. 3. User types something to the fast xterm. 4. The fast xterm receives the SelectionNotify event. It's the delay in step 2 which leads to pastes after pressing enter. These kinds of delays are inherent to the network model X uses. It could be argued that this is just a feature of X and tag this bug with wontfix. As suggested, xterm could buffer other events after the paste request. There has to be a timeout too; there is no quarantee that the selection ever comes to the client. At worst, with the timeout, the user would see the app freeze for a second or so. This could be implemented in xterm, but since this scenario is common to all X apps which use selections, this might be better implemented in the server itself. It knows when a selection is requested and when the request owner sends it. The server already knows how to buffer events. It wouldn't be hard to make the server buffer all events to a client until a timeout happens or the selection arrives. Or would that be too much of a hack?