On Sunday 25 September 2005 9:59 am, Didier Vidal wrote: > So, here is what I understand of the situation with encoding: > > * internally, gnucash-g2 data are in utf-8, whatever the locale used > to launch gnucash.
True. > What lead me to believe this is a trace I added in > xaccAccountSetName, in src/engine/Account.c It's the libxml2 documentation that should have told you about UTF-8. "One of the core decisions was to force all documents to be converted to a default internal encoding, and that encoding to be UTF-8, " http://xmlsoft.org/encoding.html "If there is no encoding declaration, then the input has to be in either UTF-8 or UTF-16, if it is not then at some point when processing the input, the converter/checker of UTF-8 form will raise an encoding error. You may end-up with a garbled document, or no document at all ! When saving: if no encoding is given, libxml2 will look for an encoding value associated to the document and if it exists will try to save to that encoding, otherwise everything is written in the internal form, i.e. UTF-8" G2 does not associate an encoding with the xmlDocPtr and does not use xmlDocPtr to write out a file using the Gnucash XML file backend so libxml2 has no chance to alter the encoding. QSF does use xmlDocPtr and I can therefore set it to use the local encoding using: char* locale = setlocale(LC_CTYPE, NULL); This will be in my next commit (provided it tests successfully). It is only for human readability, libxml2 is quite happy with everything in UTF-8. > -------- > void > xaccAccountSetName (Account *acc, const char *str) > { > char * tmp; > > > printf("xaccAccountSetName: %s\n", str); ? The actual source code doesn't use printf: void xaccAccountSetName (Account *acc, const char *str) { char * tmp; if ((!acc) || (!str)) return; xaccAccountBeginEdit(acc); { /* make strdup before freeing (just in case str==accountName !!) */ tmp = g_strdup (str); g_free (acc->accountName); acc->accountName = tmp; mark_account (acc); } acc->inst.dirty = TRUE; xaccAccountCommitEdit(acc); } You might have picked up the debug message routine. > * the encoding conversion when reading a file seems to be handled > correctly by libxml2, even if the file doesn't respect the XML spec and > doesn't specify its encoding. Correct. > * gnucash-g2 seems to write the xml files in ISO-8859-1, whatever the > locale used to launch gnucash (at least on my machine). I don't yet > understand why. I've just tried that on my system and I can find no char set conversion - it's output as UTF-8. > * The code pointed by Neil > (fprintf(out, "<?xml version=\"1.0\"?>\n"); in io-gncxml-v2.c) > is not called when you save a gnucash file. ??? Umm, it is: Click on save calls qof_session_save. qof_session_save calls QofBackend->sync_all which in the case of the GnuCash XML File backend is file_sync_all in gnc_backend_file.c - maybe you missed the indirection there, it's a generic pointer to the specific function in the backend. Each backend provides their own sync_all routine so perhaps your output mentioned sync_all and not the actual file_sync_all. file_sync_all calls gnc_file_be_write_to_file which calls gnc_book_write_to_xml_file_v2 in io-gnc-xml-v2.c gnc_book_write_to_xml_file_v2 calls gnc_book_write_to_xml_filehandle_v2 which calls write_v2_header whose first line is fprintf(out, "<?xml version=\"1.0\"?>\n"); That's how the namespace lines now appear in G2 that didn't in 1.8, write_v2_header was patched a few weeks ago to call a series of gnc_xml2_write_namespace_decl which add the xmlns:bt-days="http://www.gnucash.org/XML/bt-days" type lines at the head of all subsequent G2 files. As those lines ARE present in the file you attached previously, it is clear that write_v2_header IS being called to create the first dozen or so lines of the file, including the first one. > Anyway, it seems dangerous > to me to write an encoding specification in this part if we don't know > the actual encoding that will be used to write the rest of the file. The rest of the file follows what libxml2 does - it knows nothing about the original encoding and therefore uses the internal UTF-8 encoding of the libxml2 data. It would be dangerous to set write_v2_header to use a different encoding string, yes, because each call to xmlElemDump elsewhere in the gnc-backend would have to know the intended charset and use conversion routines in libxml2 before returning the characters to write. However, the default is to write UTF-8. If we have to specify the encoding at all, it can only be UTF-8 that can be used. Changing gnc-backend to use the locale charset is (IMHO) pointless and wasteful as it is already slated for replacement. libxml2 shows no signs of changing their default - certainly not within the timeframe that would see gnc-backend-file itself being replaced. > I > agree with David that utf-8 is a good target. Then adding UTF-8 to write_v2_header is the correct way of implementing the expression of the encoding that has always been implicit. QSF uses libxml2 to write out the XML header and that will use libxml2 to add an expression of the encoding too. However, none of this is a problem within GnuCash - libxml2 enforces the internal UTF-8 and the only omission was not to state that UTF-8 is what is written out. -- Neil Williams ============= http://www.data-freedom.org/ http://www.nosoftwarepatents.com/ http://www.linux.codehelp.co.uk/
pgpMbaCxhAFYZ.pgp
Description: PGP signature
_______________________________________________ gnucash-devel mailing list gnucash-devel@gnucash.org https://lists.gnucash.org/mailman/listinfo/gnucash-devel