Lucene - Search breadth approach

Robert Corbett Wed, 22 Jul 2009 06:09:55 -0700

Hello,

I would like to use a stemming analyser similar to KStem or PorterStem to
provide access to a wider search scope for our users. However, at the same
time I also want to provide the ability for the users to throw out the stems
if they want to search more accurately. I have a number of ideas as to the
best way to implement this.


I can control the breadth of the search scope with a checkbox on the ui.
When the scope is wide, I will use the stems, when its narrow (or exact)
I'll avoid using the stems.

The approach I envisage is to index the fields twice. Once using the
StandardAnalyser and a second time using the Stemmer. I'll attach a suffix
to the name of the stemmed set in the index. So for example, TITLE (contains
only StandardAnalyser output) and TITLE_STEM (contains the StemAnalyser
output). When I come to generate the query object, I will first check the
search breath on the UI. If its wide, I'll use the TITLE_STEM column parsing
the query with the StemAnalyser, otherwise I'll use the TITLE column with
the query being parsed with the StandardAnalyser.

Although I appreciate it will result in a much larger index and longer
indexing time, this approach will allow me to implement the required
functionality.

I just wanted to check with you guys that there is no better, perhaps more
efficient way of achieving my goals before taking the above approach. All
feedback / advice will be warmly received.

Thanks guys!

Lucene - Search breadth approach

Reply via email to