Re: [sword-devel] Searching and Lucene thoughts

2005-03-03 Thread Chris Little
Stephen, I didn't actually say anything about Boyer-Moore looking at the ends of words. The issue with suffixing languages isn't caused by the algorithm's interaction with the document being searched. It is caused by the search patterns themselves. In other words--it's not because the algorithm

RE: [sword-devel] Searching and Lucene thoughts

2005-03-03 Thread Stephen Denne
DM Smith wrote: > > Will Thimbleby wrote: > > > How do you use your BitSet? I like it at the moment where I don't > > access the document information at all until it is displayed. This > > means I can do live-searching (as the user types) for even large > > searches like "and". > > > The verse ref

RE: [sword-devel] Searching and Lucene thoughts

2005-03-03 Thread Stephen Denne
TED] Behalf Of Chris Little > Sent: Thursday, March 03, 2005 8:54 AM > To: SWORD Developers' Collaboration Forum > Subject: Re: [sword-devel] Searching and Lucene thoughts > > > No. Standard Sword searches just start at the beginning and search to > the end, byte by byte. &

Re: [sword-devel] Searching and Lucene thoughts

2005-03-02 Thread DM Smith
Will Thimbleby wrote: On 2 Mar 2005, at 12:45 am, DM Smith wrote: Restricting of searches: Again another area that is essential for speed to do in lucene. I haven't figured this one out yet, but I'm thinking I will write a custom lucene filter. Which would be much faster if I stored the verse

RE: [sword-devel] Searching and Lucene thoughts

2005-03-02 Thread Vonnahme, Nathan
> Here are some things Accordance does: -- it just seems over > complicated > to me (I can't see how some of the features would ever be used other > than tedious academic research) > > It can search within: verse, chapter, clause, sentance, > paragraph, book > You can specify tags for: stem,

Re: [sword-devel] Searching and Lucene thoughts

2005-03-02 Thread Will Thimbleby
On 2 Mar 2005, at 7:53 pm, Chris Little wrote: The standard linear search is the most general purpose search algorithm, and I think general purpose is what we need to maintain. For people who want faster searches, there is indexed searching available. While this in some sense is true. The method

Re: [sword-devel] Searching and Lucene thoughts

2005-03-02 Thread Will Thimbleby
On 2 Mar 2005, at 12:45 am, DM Smith wrote: Can we enumerate what Lucene does not support that we want for Biblical searching? The only thing I saw was that it did not find adjacent documents. For example, find all verses containing Moses within 5 verses of Aaron. As long as we build the index

Re: [sword-devel] Searching and Lucene thoughts

2005-03-02 Thread [EMAIL PROTECTED]
I have implemented the Boyer-Moore before. I think that it is a bit biased toward common prefix languages as you stated. But it would work on any. The problem that I encountered is that it is easy to write for 7 bit ascii and later for 8 bit latin-1, but the compiled fsa becomes difficult for a

Re: [sword-devel] Searching and Lucene thoughts

2005-03-02 Thread Chris Little
No. Standard Sword searches just start at the beginning and search to the end, byte by byte. Just on the basis of the abstract you link to, I don't see how this would be of any benefit. The Boyer-Moore algorithm is very language-specific. It benefits from the fact that English is a predominant

Re: [sword-devel] Searching and Lucene thoughts

2005-03-02 Thread Lynn Allan
Just curious ... does non-indexed sword-api searching use c.s. algorithms like Boyer-Moore searching? http://portal.acm.org/citation.cfm?id=359859&coll=ACM&dl=ACM&CFID=13545783&CFTOKEN=93236524 Something I tried to read once (and it was waay over my head) concerned very smart "state machine"

Re: [sword-devel] Searching and Lucene thoughts

2005-03-01 Thread DM Smith
When the index is built, lucene sees each verse as a separate document. When a document is added to the index, lucene gives it a number one higher than the last document added. When a hit is returned from lucene, one of the fields of that hit is that number (called the document id). As long as

Re: [sword-devel] Searching and Lucene thoughts

2005-03-01 Thread DM Smith
Will Thimbleby wrote: I apologies for my ramblings, but here are some searching thoughts that I've collected as I implemented lucene searching in MacSword: Searching It is more complicated than I thought, and lucene doesn't quite do everything. Certainly to do a document range is something that

Re: [sword-devel] Searching and Lucene thoughts

2005-03-01 Thread Chris Little
Any chance we can get verse-crossing search hits with Lucene? What I mean is, suppose a person is searching for something and knows a few different words in the phrase, but the phrase itself crosses a verse boundary. Can we make this work at all, where the search returns a range of verses that

[sword-devel] Searching and Lucene thoughts

2005-03-01 Thread Will Thimbleby
I apologies for my ramblings, but here are some searching thoughts that I've collected as I implemented lucene searching in MacSword: Searching It is more complicated than I thought, and lucene doesn't quite do everything. Certainly to do a document range is something that needs to be bolted on