Stephen,
I didn't actually say anything about Boyer-Moore looking at the ends of
words. The issue with suffixing languages isn't caused by the
algorithm's interaction with the document being searched. It is caused
by the search patterns themselves. In other words--it's not because the
algorithm
DM Smith wrote:
>
> Will Thimbleby wrote:
>
> > How do you use your BitSet? I like it at the moment where I don't
> > access the document information at all until it is displayed. This
> > means I can do live-searching (as the user types) for even large
> > searches like "and".
> >
> The verse ref
TED] Behalf Of Chris Little
> Sent: Thursday, March 03, 2005 8:54 AM
> To: SWORD Developers' Collaboration Forum
> Subject: Re: [sword-devel] Searching and Lucene thoughts
>
>
> No. Standard Sword searches just start at the beginning and search to
> the end, byte by byte.
&
Will Thimbleby wrote:
On 2 Mar 2005, at 12:45 am, DM Smith wrote:
Restricting of searches:
Again another area that is essential for speed to do in lucene. I
haven't figured this one out yet, but I'm thinking I will write a
custom lucene filter. Which would be much faster if I stored the
verse
> Here are some things Accordance does: -- it just seems over
> complicated
> to me (I can't see how some of the features would ever be used other
> than tedious academic research)
>
> It can search within: verse, chapter, clause, sentance,
> paragraph, book
> You can specify tags for: stem,
On 2 Mar 2005, at 7:53 pm, Chris Little wrote:
The standard linear search is the most general purpose search
algorithm, and I think general purpose is what we need to maintain.
For people who want faster searches, there is indexed searching
available.
While this in some sense is true. The method
On 2 Mar 2005, at 12:45 am, DM Smith wrote:
Can we enumerate what Lucene does not support that we want for
Biblical searching?
The only thing I saw was that it did not find adjacent documents. For
example, find all verses containing Moses within 5 verses of Aaron.
As long as we build the index
I have implemented the Boyer-Moore before. I think that it is a bit
biased toward common prefix languages as you stated. But it would work
on any. The problem that I encountered is that it is easy to write for 7
bit ascii and later for 8 bit latin-1, but the compiled fsa becomes
difficult for a
No. Standard Sword searches just start at the beginning and search to
the end, byte by byte.
Just on the basis of the abstract you link to, I don't see how this
would be of any benefit. The Boyer-Moore algorithm is very
language-specific. It benefits from the fact that English is a
predominant
Just curious ... does non-indexed sword-api searching use c.s.
algorithms like Boyer-Moore searching?
http://portal.acm.org/citation.cfm?id=359859&coll=ACM&dl=ACM&CFID=13545783&CFTOKEN=93236524
Something I tried to read once (and it was waay over my head)
concerned very smart "state machine"
When the index is built, lucene sees each verse as a separate document.
When a document is added to the index, lucene gives it a number one
higher than the last document added. When a hit is returned from lucene,
one of the fields of that hit is that number (called the document id).
As long as
Will Thimbleby wrote:
I apologies for my ramblings, but here are some searching thoughts
that I've collected as I implemented lucene searching in MacSword:
Searching
It is more complicated than I thought, and lucene doesn't quite do
everything. Certainly to do a document range is something that
Any chance we can get verse-crossing search hits with Lucene? What I
mean is, suppose a person is searching for something and knows a few
different words in the phrase, but the phrase itself crosses a verse
boundary. Can we make this work at all, where the search returns a range
of verses that
I apologies for my ramblings, but here are some searching thoughts that
I've collected as I implemented lucene searching in MacSword:
Searching
It is more complicated than I thought, and lucene doesn't quite do
everything. Certainly to do a document range is something that needs to
be bolted on
14 matches
Mail list logo