I noticed that in st, combined Unicode characters don't seem to be preserved in memory. For example, if I run "printf 'AB\xcd\x9dCDE\n'" in a Xterm then select the resulting line, I the clipboard data includes the Unicode sequence:
~% echo $TERM xterm-256color ~% printf 'AB\xcd\x9dCDE\n' ABÍCDE ~% xclip -o | xxd 0000000: 4142 cd9d 4344 450a AB..CDE. However, with st, the sequence vanishes: ~% echo $TERM st-256color ~% printf 'AB\xcd\x9dCDE\n' ABCDE ~% xclip -o | xxd 0000000: 4142 4344 450a ABCDE. Urxvt's behaviour is also the same as Xterm with an added bonus: it actually renders the combined Unicode sequence where as on Xterm and st, the tie character is not visible (although if you paste "AB\u035d" into st with no other trailing characters, the tie appears albeit glitchily). I don't have a patch or any immediate plans to look into patching it but perhaps improve Unicode support could be added to the TODO list. Eric