On Tue, Aug 14, 2007 at 07:55:36PM +0100, Marcin Owsiany wrote: > First, a short explanation of the use case: > > 1. User runs poedit (aka potooledit) on a partially translated po file. > 2. Poedit retrieves only the untranslated messages from the file (by > filtering it through potool -fnt) and puts them into a temporary po > file > 3. Poedit launches $EDITOR on that temporary po file > 4. User does some translation, saves the file, exits the editor > 5. Poedit merges the original and the temporary file back together > > Now, to reproduce the bug: > > 1. use an editor which can auto-detect the file encoding, e.g. vim > AND > 2. run poedit on a file which is in encoding A, while your locale is set > to use encoding B. (where neither A nor B is a subset of the other. For > example UTF-8 and Latin2)
Uhm, Latin2 _is_ a subset of UTF-8. > What happens in step 3 is that vim looks at an ascii-only file (since > msgids are in POSIX locale) and when the user inputs the translation in > her own language, the editor decides to use encoding B (since it's the > locale default). Any non-broken editor _has_ to use encoding B and only encoding B, at least not without a very explicit user override. So it loads the file assuming it uses encoding B (Latin2 in your case) and also saves it in that way. > Then in step 5 poedit merges the original (in encoding A) and the > temporary (in encoding B) creating a broken and a difficult to fix file > with different parts in differing encodings. > > Does anyone have any ideas on how to fix this properly, keeping in mind > that poedit is editor-agnostic so it is hard to determine what encoding > the editor has chosen to use for the temporary file. Just do this: iconv -f utf-8 <po >tmp;$EDITOR tmp;iconv -t utf8 <tmp It will use the local encoding when running the editor. > The only metadata available seems to be the Content-type field of the > header in the original po file, but I can't see how to enforce it for > the temporary file... The only editor-agnostic way would be to change the locale, intercept all input and output of the editor -- doable on tty, much harder in X, and a very bad idea generally. Few editors can handle a different encoding for on-disk files and their user interface, and they are not really supposed to anyway. As it's unwise to mess with the user's locale, you can change only the on-disk one, and this is the way to go. HTH. -- 1KB // Microsoft corollary to Hanlon's razor: // Never attribute to stupidity what can be // adequately explained by malice. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]