No, arm would probably be slow as well; I'm running sword on a arm based linux handheld. I definitely would ~not~ want to slow down the search, it's slow enough as it is, and memory is definitely important as well, so having a large search index would not be useful to me.
-- jordan On Wed, 11 Sep 2002 [EMAIL PROTECTED] wrote: > > On September 9, 2002 07:12, [EMAIL PROTECTED] wrote: > > > Bible is 31102 (if I counted correctly) verses. It is ~3.8Kbytes if a bit > > > for every verse. > > > > You counted all the verses in the Bible?! (grin) > > > > > Searching for "Christ & (God | Father)" we can construct 3 such bit vectors > > > (~10.6Kbytes) and then make logical operations over these. > > > > Bit vectors have some nice properties such as the ability to do very fast > > logical operations. However, they have some significant downsides as well: > > > > 1. They are very large to store for the Bible. I did a quick calculation and I > > figured the indexes I've build would increase approx 10 x if I stored them as > > bit vectors. The reason for this is that the average word occurs only 100 > > times, at least in the KJV (I assume other word based languages are in the > > same order of magnitude). This means that 4K bit vectors are very sparse. > > I don't suggest to store so for anything, but only for the most often > encountered words (like "the"). > > > 2. Converion to and from them can be costly computationaly (especially > > converting from them). Since storing bit vectors and returning bit vectors to > > the frontends aren't options this would have to be considered. > > If my memory is right, 80386 has a special command for searching ones in bit > vectors. In any case searching non-zeor bytes is fast. > > > 3. Perhaps most significantly, bit vectors are only really a big improvement > > for logical operators. Verse and word proximity (i.e. within x verses, or > > within y words) are better done other ways. This could easily lead to > > multiple conversions to and from bit vectors just to complete one search > > expression. > > I'm not about verse proximity, but namely about paragraphs with specified > borders! > > > > I can (as will have time) even write necessary algorithms. If it will be > > > too slow for 80386, I can remember its assembler! > > > > Since Sword is a cross platform library, assembler isn't really an option (I > > know it is already compiled on at least 3 different CPU arcitectures). Plus, > > do you really think hand coded assembly would be much faster than what a good > > compiler could produce for a series of bitwise logical operations on arrays? > > Isn't only 80386 slow? >