On Thursday 29 June 2006 22:23, Martin Simmons wrote: > >>>>> On Thu, 29 Jun 2006 18:33:12 +0200, Kern Sibbald said: > > > > On Thursday 29 June 2006 18:09, Martin Simmons wrote: > > > >>>>> On Thu, 29 Jun 2006 18:26:44 +0300, Peteris Krisjanis said: > > > > > > > > I found a solution, or at least it could clasified as working > > > > workaround, so I post it here for archives or someone else who has > > > > problems with it. > > > > > > > > So, I have Bacula server/director/sd as Debian and client/fd as OS X > > > > server. Bacula was installed trough Fink (in unstable/CVS packages, > > > > compiled from tar.gz). I configured it common client for Bacula and > > > > ensure it has right permitions to stream files to director. > > > > > > > > My problem was that I wanted to exclude files with Unicode characters > > > > with it. So I wrote simply in bacula-dir.conf file unicode symbols > > > > trough Gedit, restarted director and tried to launch my job. It > > > > failed to recognize Unicode characters written in director's file and > > > > went on with backup of these files, instead of excluding. > > > > > > > > First, I messed with various things like configuration file, tried > > > > the same situation with Linux workstation (where this situation was > > > > non-issue), etc. and then googled (and in same time got at least > > > > informative message from mailing list, thanks everyone for > > > > suggestions) and figured out that it is OS X different handling of > > > > Unicode on it's HFS+ file systems. OS X uses different way of > > > > composing characters (so called decomposed canonical format), so, it > > > > didn't understood simply what I wanted from it. > > > > > > > > First of all I think Bacula should be fixed to support this, but as > > > > it could take a quite time, but I loved Bacula and would like to have > > > > it as backup solution, I searched for some workarounds. And here is > > > > one. > > > > > > > > What is needed - graphical terminal like Konsole or Terminal of GNOME > > > > fame (or any other terminal with UTF-8 support). Open ssh connections > > > > to Debian (server) and OS X (client). On both boxes locale should > > > > UTF-8 (en_US.UTF-8 on Mac, en_US.utf8 on Linux). On OS X box, do ls > > > > -lah or simply ls to get OS X "version" of file name in Unicode > > > > (unicode chars will mostly look like upper line). Simply do a Ctrl+C > > > > or copy, and then go to Debian box, open bacula-dir.conf and go to > > > > FileSet you need to get ths file/directory name in and paste it in. > > > > Save and restart bacula-dir and go on with your jobs. > > > > > > I'm glad you found a solution. It is probably the best one for now. > > > > > > The issue is rather a nightmare, because on Linux you can probably > > > create two files that differ only in their canonicalization. :-( > > > > > > Possibly Bacula needs to have an option (per fileset?) that controls > > > unicode canonicalization/comparison, but it potentially spreads across > > > the FD, Director and any restore guis. > > > > Hello Martin, > > > > What the devil is unicode canonicalization? > > > > Do you mean ensuring that the UTF-8 is proper UTF-8 since it is possible > > to write incorrect UTF-8, which more or less works (at least for > > display), or are you talking about something like converting 16 bit > > Unicode to UTF-8? > > No, it is not related to UTF-8 itself, but is a problem inherent in the 16 > (or more) bit Unicode codes. > > The issue is that a human reader generally just wants the visual appearance > of text to be right, but Unicode has to represent this as integer codes for > programatic use and also has to deal with lots of legacy codes. > > The result is that there are multiple ways to represent things in Unicode. > > In this case, for accented letters, you can have a single (e.g. Latin-1) > code for the accented letter or a code (e.g. ASCII) for the unaccented > letter followed by some special codes for the accents. The conversion > between these forms is called (de)composition. There are other things like > this, which leads to need for canonical forms (e.g. with maximum > composition or maximum decomposition) to help programmers handle things > like comparison of strings. > > The OP's problem was that different operating systems handle > canonicalization differently. Linux (AFAIK) does no canonicalization in > the kernel (applications are expected to do it) whereas the Mac OS X kernel > converts all filenames to canonical decomposed format in the filesystem > implementation. > > Mixing composed and decompased strings within the same Director/Catalog > leads to great confusion... >
Egads, thanks for the details. I had forgotten about that aspect of Unicode since I rarely work with Windows (or Mac). Hopefully myself or someone else Frank? could summarized this for the manual. For the moment, the manual lacks all mention of Unicode/UTF-8, so I will add this to my todo so it does not get lost. -- Best regards, Kern ("> /\ V_V Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users