Chris, I imagine that with most languages, sorting according to unicode codepoint order works, but for Vietnamese it doesn't, probably because the majority of letters are standard Latin characters, but then some are less usual ("đ" being a good example). This is probably very low on the priority list and I'm not sure how much work this would involve, but I would suggest at some point adding an option to the command line syntax for imp2ld either to 1. sort the order of keys according to unicode (default) or 2. retain the order of the IMP file (not sort at all). That way languages that do not alphabetize well according to the codepoint order in Unicode can remain in alphabetical order (assuming the module creator sorted correctly). Daniel Chris Little wrote: Daniel, The order of keys in an LD module is according to the codepoint order in Unicode. They keys are kept in this order in order to permit binary searching. There is currently no way to perform localized collation.The platform and locale shouldn't play a role in this. If they do, it's a bug. --Chris Daniel Owens wrote:I am working on creating dictionary modules based on the Free Vietnamese Dictionary Project. The Vietnamese-English dictionary is working, but some words are not in alphabetical order, and I am trying to find out how to maintain the original alphabetization. I noticed this when all of the words beginning with a vowel having diacritics/tones or beginning with a "Ä‘" were sorted to the end of the dictionary. The DAT file maintains the original order, which is more accurate. It must be that the IDX file generated by imp2ld creates its own index and alphabetizes according to it's own scheme. The entries of each word are tagged as ThML. Here is a slightly random entry: $$$ác bá <entry key="ác bá" type="main" id="n20"><b>ác bá</b><br />[noun]<br />- Cruel landlord, village tyrant<br /></entry> Is there a way to keep imp2ld from changing the order of the index? I am happy to send someone the IMP file if that helps. I pasted the CONF file at the bottom of this message. Daniel CONF File: [VietAnh] DataPath=./modules/lexdict/rawld4/vietanh/vietanh ModDrv=RawLD4 Encoding=UTF-8 SourceType=THML SwordVersionDate=2007-10-27 Version=1.0 Lang=vi Description=FVDP Vietnamese-English Dictionary About=- This is the Vietnamese-English dictionary database of the Free Vietnamese Dictionary Project. It contains more than 23.400 entries with definitions and illustrative examples.\par\par- This database was compiled by Ho Ngoc Duc and other members of the Free Vietnamese Dictionary Project (http://www.informatik.uni-leipzig.de/~duc/Dict/)\par\par- Copyright (C) 1997-2003 The Free Vietnamese Dictionary Project\par\par- This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. TextSource=http://www.informatik.uni-leipzig.de/~duc/Dict/ _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page |
_______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page