On Mon, Jul 9, 2012 at 3:30 PM, Stefan Sperling <s...@apache.org> wrote: > On Mon, Jul 09, 2012 at 02:47:25PM +0200, Bert Huijben wrote: >> How do you check if the file you are merging is valid utf-8? > > See the merge_chunks() function. > > We convert data to UTF-8 from the native (locale) encoding. > This cannot fail (every encoding can be represented in UTF-8) > but the result might look funny in case the file uses some other encoding > than the native one. But that's OK -- this conversion happens only for > display purposes, data in the actual file is never changed, so you can > still edit individual chunks in their original form.
I'm a bit confused (encoding issues always confuse me). If we only care about the width of the string for display purposes, doesn't this (also) depend on the encoding used by the console / terminal? How does that actually work: if you have a UTF-8 encoded file, and you 'cat' it to a terminal with LC_ALL=iso_8859_1 ... ? -- Johan