On Mon, 2008-03-31 at 09:54 -0400, Robert DuToit wrote: > I am trying to help my friend set up his rsync with iconv. Presently > it works fine but re-copies every file with an umlaute in the > filename. I saw a recent post about this and the fix but... > > he ran "locale" ( both source and dest are on same machine) and is > running on a German-Swiss locale
Please be more specific about what your friend is trying to accomplish! If he's trying to perform a nontrivial conversion, he definitely won't accomplish it with "--iconv=.". That option tells rsync to convert from the sending machine's default charset to the receiving machine's default charset, but in his case, source and destination are on the same machine, so these charsets are one and the same! To convert from a charset A to a charset B on a local run, he should pass "--iconv=A,B". > xserve-backup-02:/Volumes/Backup RAID 8TB teleclub$ locale > LANG= > LC_COLLATE="C" > LC_CTYPE="C" > LC_MESSAGES="C" > LC_MONETARY="C" > LC_NUMERIC="C" > LC_TIME="C" > LC_ALL="C" > > we tried adding option iconv=C and iconv=C,C but no luck - it still > re-copies very file with an umlaute. > > I checked "iconv --list" on my Mac and see no "C" listed. I am not > sure if "C" is correct either. "C" is a "standard" locale whose associated charset is ASCII. Based on the log output (below), your friend should probably be using de_CH.UTF-8 . > Example: > > The correct file name would be "Action des Monats für vertonung.mov" > and not "Action des Monats f\#303\#274r vertonung.mov" > > but the log shows it not translated: > > /Volumes/SAN_Video/Final Cut Pro Documents/Capture Scratch/Action des > Monats/! Render/Action des Monats f\#303\#274r vertonung.mov > 32768 0% 344.09kB/s 0:54:11 > 42205184 3% 40.25MB/s 0:00:26 All that's happening here is that the source filename is in UTF-8, but rsync is escaping the two high bytes in its log output because they are invalid in ASCII, the charset implied by the specified locale C. If your friend switches to a *.UTF-8 locale, rsync will show him the umlaut as-is. > Though the actual file name gets copied correctly to dest, obviously > the mapping (if that is what it is called) is different causing rsync > to update the file every time. The output escaping issue won't cause recopying, but from what you say, I can guess what the real problem is. I notice that the source filename is in composed UTF-8, and the Mac OS X HFS+ filesystem has an annoying behavior of silently decomposing UTF-8 characters in filenames. Suppose the destination is on HFS+ and your friend is using --delete. Rsync will copy the file, but the destination filesystem will store its name with a decomposed umlaut-u (three bytes 0x75, 0xcc, 0x88). Rsync compares binary filenames without regard for charset-specific conventions, so on the next run, it will fail to recognize the decomposed destination file as corresponding to the source file, delete the destination file, and transfer the file again. Essentially, rsync tries and tries again to create a destination file with the same (binary) name as the source file, but the filesystem keeps foiling it. You can avoid this problem by passing --iconv=UTF-8,UTF8-MAC . UTF8-MAC is a pseudo-charset recognized by Mac OS X iconv in which all characters are decomposed. This way, rsync will decompose the source filename and recognize it as matching the destination filename. Wayne, please consider adding this material to the "copies every file" entry on http://rsync.samba.org/FAQ.html . Matt -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html