Hello Dan, I really cannot comment on the use of bytea vs text, but I can say that the Unicode (UTF-8) that Bacula uses are perfectly good 8 bit values. There are no values of 0x10000. This is why UTF-8 for nearly all purposes is identical to standard text -- i.e. 8 bit values terminated by a zero byte. To display the correct character, it of course requires a bit of interpretation of the bits. In addition, any program that looks at the bytes needs to take care in interpreting them since all 8 bits are used.
A simple solution is that the database treat all "strings" as simply binary data that is zero terminated. A more correct solution is to tell the database that we are using UTF-8 in *all* strings -- including SELECT, ... If the database can handle UTF-8, there is no need to change anything anywhere except to get the database in the right mode. Proof of this is that as of last report, the 1.37 Win32 FD, that now converts from Microsoft's 16 bit Unicode to UTF-8, works fine with MySQL and any European character set as well as Chinese characters. Best regards, Kern On Wednesday 26 October 2005 01:13, Dan Langille wrote: > On 25 Oct 2005 at 15:04, Kern Sibbald wrote: > > On Tuesday 25 October 2005 14:40, Michael Galloway wrote: > > > good day all ... > > > > > > i'm still having problems with unicode sequences in filenames. for > > > example: > > > > > > 24-Oct 16:02 lance-dir: Start Backup JobId 16, > > > Job=AspenHome.2005-10-24_16.02.48 24-Oct 16:02 lance-dir: Recycled > > > volume "AspenHome-0001" > > > 24-Oct 16:06 lance-sd: Recycled volume "AspenHome-0001" on device > > > "AspenHomeFileStorage" (/backups/aspen/home), +all previous data lost. > > > 24-Oct 17:31 lance-dir: AspenHome.2005-10-24_16.02.48 Fatal error: > > > sql_create.c:826 sql_create.c:826 query SELECT +FilenameId FROM > > > Filename WHERE Name='About EuG?ne.doc' failed: > > > ERROR: invalid byte sequence for encoding "UNICODE": 0xe86e65 > > > > > > 24-Oct 17:31 lance-dir: sql_create.c:826 SELECT FilenameId FROM > > > Filename WHERE Name='About EuG?ne.doc' 24-Oct 17:31 lance-dir: > > > AspenHome.2005-10-24_16.02.48 Fatal error: sql_create.c:851 > > > sql_create.c:851 insert INSERT+INTO Filename (Name) VALUES ('About > > > EuG?ne.doc') failed: ERROR: invalid byte sequence for encoding > > > "UNICODE": 0xe86e65 > > > > > > 24-Oct 17:31 lance-dir: sql_create.c:851 INSERT INTO Filename (Name) > > > VALUES ('About EuG?ne.doc') 24-Oct 17:31 lance-dir: > > > AspenHome.2005-10-24_16.02.48 Fatal error: sql_create.c:853 Create db > > > Filename record +INSERT INTO Filename (Name) VALUES ('About > > > EuG?ne.doc') failed. ERR=ERROR: invalid byte sequence for encoding > > > +"UNICODE": 0xe86e65 > > > > > > 24-Oct 17:31 lance-dir: AspenHome.2005-10-24_16.02.48 Fatal error: > > > catreq.c:427 Attribute create error. +sql_create.c:853 Create db > > > Filename record INSERT INTO Filename (Name) VALUES ('About EuG?ne.doc') > > > failed. +ERR=ERROR: invalid byte sequence for encoding "UNICODE": > > > 0xe86e65 > > > > > > > > > i appreciate that bacula is not unicode compatible. but is there anyway > > > to configure bacula to drop/ignore/flag/do something with the unicode > > > filenames that will let the backup complete? i really have little > > > control over how users name files in their home directories and it > > > would be nice if i could get a successful backup of my home directory > > > filesyste. > > > > Hello, > > > > Bacula does work with Unicode names. It works in UTF-8 (Unicode), which > > is what all Unix/Linux machines typically use as a default as well as > > MySQL and SQLite. I've seen some cases similar to yours on PostgreSQL, > > where it is apparently using 16 bit Unicode. I suspect you are using > > PostgreSQL, and if so you need to reconfigure it to use UTF-8. When you > > figure out how to do so and get it to work, please let me know so that I > > can add it to the document. > > I think I know the cause of the problem and the reason why this error > occurs. It comes back to my choice of database types for > filename.name within the PostgreSQL database schema. > > DarcyB suggested that the problem was not database encoding, but > type. We are using text to store values that are outside the range > of valid text values. e.g. 0xe86e65 which is over 0x10000. That's > why we need to use bytea, AFAIK. > > I'm proposing that we use bytea instead of text. It seems that text > does not allow the range of values that bytea will allow. We'll need > to make code changes and database changes. I'll be working with > Micheal to test the code changes. > > The follow is the database change for an existing database: > > create cast (text as bytea) without function; > > alter table filename add column name_new bytea; > update filename set name_new = cast(name as bytea); > alter table filename drop name; > alter table filename rename name_new to name; > create index filename_name_idx on filename(name); > > The table creation would be: > > create table filename > ( > filenameid serial not null, > name bytea not null, > primary key (filenameid) > ); > > > DarcyB helped out with a work in progress for the code changes: > > http://www.dbitech.ca/bacula-bytea.patch > > The errors we are still gettting are: > > 25-Oct 17:12 lance-dir: *Console*.2005-10-25_16.42.18 Fatal error: > sql_list.c:350 sql_list.c:350 query SELECT Path.Path||Filename.Name > AS Filename FROM File,Filename,Path WHERE File.JobId=3 AND > Filename.FilenameId=File.FilenameId AND Path.PathId=File.PathId > failed: ERROR: operator does not exist: text || bytea > > The above requires a change to the SQL. > > The below can be fixed with code changes: > > 25-Oct 18:18 lance-dir: Client1.2005-10-25_18.18.37 Fatal error: > sql_create.c:826 sql_create.c:826 query SELECT +FilenameId FROM > Filename WHERE Name='1edited\\6.19.01.22TNK.l&m.csv' failed: > ERROR: invalid input syntax for type bytea > > 25-Oct 18:18 lance-dir: Client1.2005-10-25_18.18.37 Fatal error: > sql_create.c:851 sql_create.c:851 insert INSERT +INTO Filename (Name) > VALUES ('1edited\\6.19.01.22TNK.l&m.csv') failed: > ERROR: invalid input syntax for type bytea > > Those are the only two places that the code references Filename.name. > > I don't know how this will affect other Bacula projects but if we do > proceed, I think we need to give them a heads up. ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today * Register for a JBoss Training Course Free Certification Exam for All Training Attendees Through End of 2005 Visit http://www.jboss.com/services/certification for more information _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users