I have entered this into our "bugs" database: http:// www.crosswire.org/bugs/browse/API-91
Serving Him Together, DM On Oct 29, 2007, at 1:00 AM, Troy A. Griffitts wrote: > Yes, everyone is correct that the .next() method on a Lexicon/ > Dictionary > module will show the next value in the index-- not necessarily the > next > value alphabetized in any humanly useful order. > > The purpose for the index is fast lookups. > > We have a few issues to solve here and DM and others have given good > suggestions. > > 1 solutions to 1 small part of the problem, at least for frontends > that > load the entire index: sort the keys however you want using ICU or > whatever Unicode/localization tools your toolkit provides. > > Retaining the import order isn't necessarily straight forward. The > SWORD API exposes a dynamic modification interface which allows > deletes > and insert at any time. Technically, entries can be added, removed, > modified, etc. to update and maintain any module. Practically, this > doesn't usually happen (we just import with a tool once and never > modify > again), but with some of the new community editing projects and > tools in > the works, this may be a more common event. > > Generating a secondary index on a lexdict which preserves some other > order and alternate key is great idea and an easy addition to the > current code. > > I am not in favor of using lucene for any core functionality as it > would > mandate a requirement, which is not practical on all platforms. We > can > easily implement the same thing with our rawstr index without > incurring > this penalty. > > This is a good item to consider for 1.5.11 > > -Troy. > > > DM Smith wrote: >> I'm not sure if I am reading the Sword code correctly, but it appears >> that it is sorting at a byte level and not a character level. That >> isn't by code points. >> >> I think that we discussed this a little bit ago and concluded that >> some work needs to be done in the engine. >> >> Her is my thought on the matter, for what it is worth. Today the sort >> serves two purposes: order and search. But it is search that >> constrains the order to be as it is. I think that if we could search >> independently of the order of keys in the module that would be ideal. >> >> One simple way for any application to provide this is to create a >> Lucene index similar to what we do for a Bible for the dictionary >> (similar to what we do for a Bible) that consists of the term (stored >> and indexed), the offset (stored) in the module (so it can be >> retrieved and previous and next indexes can be found), the entry for >> the term (indexed, but not stored). The application can then create >> any kind of collation of the keys (using the excellent facilities of >> ICU) that suite the user's needs. Then using this double handle >> present the keys in part (as in BibleCS) or whole (as in >> BibleDesktop, MacSword, ...) in the order that the user expects. >> >> There are some related problems to this: >> A user may expect to be able to find a Hebrew word in a Hebrew >> dictionary independent of the pointing of the word in the dictionary. >> (i.e. a user may wish to search without specifying accents) >> A user may expect to find a word by stem not just by prefix. >> A user may expect to be able to type "photos" (a transliteration) and >> find the real Greek word in a Greek dictionary. >> >> I'm cross-posting to J-Sword because this will be of interest there >> as well. >> >> In His Service, >> DM Smith >> >> >> On Oct 28, 2007, at 9:13 PM, Frank wrote: >> >>> peter wrote: >>>> Is this really only a Vietnamese problem, but will not all latinate >>>> scripts with extra signs have exactly the same problem? >>>> >>>> Or actually all scripts which are treated as derrived scripts - >>>> Farsi, >>>> urdu and Malay from Arabic, Tajik, Uzbek, Azeri from Russian etc - >>>> the >>>> code points are initially for the "main" characters and then there >>>> is a >>>> always bunch of extra characters which are used only in one or >>>> other >>>> language. >>>> >>>> But maybe I am just showing my ignorance here. I need to look at >>>> some >>>> dictionaries - never had any installed. >>> Any language that uses letters outside the ASCII range will be >>> affected >>> unless the collate the letter after "z"... and if it's strictly in >>> Unicode point order, then all upper case will collate before lower >>> case... >>> >>> -- >>> Blessings >>> >>> Frank >>> >>> >>> _______________________________________________ >>> sword-devel mailing list: sword-devel@crosswire.org >>> http://www.crosswire.org/mailman/listinfo/sword-devel >>> Instructions to unsubscribe/change your settings at above page >> >> >> _______________________________________________ >> sword-devel mailing list: sword-devel@crosswire.org >> http://www.crosswire.org/mailman/listinfo/sword-devel >> Instructions to unsubscribe/change your settings at above page > > > _______________________________________________ > sword-devel mailing list: sword-devel@crosswire.org > http://www.crosswire.org/mailman/listinfo/sword-devel > Instructions to unsubscribe/change your settings at above page _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page