Hi, On Fri, Jun 27, 2014 at 04:54:08PM -0500, Eric Pruitt wrote: > I noticed that in st, combined Unicode characters don't seem to be > preserved in memory. For example, if I run "printf 'AB\xcd\x9dCDE\n'" in > a Xterm then select the resulting line, I the clipboard data includes > the Unicode sequence: > > ~% echo $TERM > xterm-256color > ~% printf 'AB\xcd\x9dCDE\n' > AB͝CDE > ~% xclip -o | xxd > 0000000: 4142 cd9d 4344 450a AB..CDE. > > However, with st, the sequence vanishes: > > ~% echo $TERM > st-256color > ~% printf 'AB\xcd\x9dCDE\n' > ABCDE > ~% xclip -o | xxd > 0000000: 4142 4344 450a ABCDE. > > Urxvt's behaviour is also the same as Xterm with an added bonus: it > actually renders the combined Unicode sequence where as on Xterm and st, > the tie character is not visible (although if you paste "AB\u035d" into > st with no other trailing characters, the tie appears albeit glitchily). > > I don't have a patch or any immediate plans to look into patching it but > perhaps improve Unicode support could be added to the TODO list.
>From what I see, these kind of characters are simply ignored by st. I don’t know if this is by design or by default but, in tputc(), wcwidth() on these kind of characters will return 0, the character will be copied at the current position and then the cursor will move of wcwidth, which is 0. So then it will be overwritten by the next character. I don’t know if this has already been discussed but this kind of characters really seems to be a hassle to support: variable terminal line, potentially more than twice as long as they actually are (potentially arbitrary long?), huge pain to draw correctly… So this seems to be a really good example of the kind of sucky feature we don’t want to add. But I agree with you, if we want to have a true and accurate support of Unicode, we should have this. -- Ivan "Colona" Delalande