Troy A. Griffitts wrote: > We probably need to do a few things here besides toupper (to assure > entry matches), as we've learned and done in our search code. We > probably should at least normalize the utf8. This is not a big hit > because it is only done on module creation for every key, and then once > for the input word before the binary search starts.
I wish we could display keys in non-touppered form. Capitals are so ugly, especially outside of basic modern western European languages. > We could change the actual order to use a utf8 strcmp method, but this > would likely come with a relatively significant performance hit (though > maybe not-- the binary search algol will significantly limit the number > of actual utf8 strcmp operations we would need to perform). This change > would require remaking any modules which use multibyte utf8 keys. Collation is tricky. For one, it is always language-dependent. We have all the necessary data (at least for modern languages) in ICU, but using that means requiring ICU, which I'm quite fine with for desktop/server frontends, but isn't as practical for handhelds. Independent of basic, language-wide collation standards, some dictionary editors pick different sort orders. The only way to cater to that is to store the records in their own module-specific order (e.g. using a GenBook-based system for the whole thing or somehow throwing away the binary search system). Given that most front-ends are listing the complete contents of the LD modules, which negates the utility of the binary searches, it might not be a bad idea to scrap the current system and make key-entry operate as a pattern-matching search (maybe regex?). --Chris _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page