Hello, I would like to use a stemming analyser similar to KStem or PorterStem to provide access to a wider search scope for our users. However, at the same time I also want to provide the ability for the users to throw out the stems if they want to search more accurately. I have a number of ideas as to the best way to implement this.
I can control the breadth of the search scope with a checkbox on the ui. When the scope is wide, I will use the stems, when its narrow (or exact) I'll avoid using the stems. The approach I envisage is to index the fields twice. Once using the StandardAnalyser and a second time using the Stemmer. I'll attach a suffix to the name of the stemmed set in the index. So for example, TITLE (contains only StandardAnalyser output) and TITLE_STEM (contains the StemAnalyser output). When I come to generate the query object, I will first check the search breath on the UI. If its wide, I'll use the TITLE_STEM column parsing the query with the StemAnalyser, otherwise I'll use the TITLE column with the query being parsed with the StandardAnalyser. Although I appreciate it will result in a much larger index and longer indexing time, this approach will allow me to implement the required functionality. I just wanted to check with you guys that there is no better, perhaps more efficient way of achieving my goals before taking the above approach. All feedback / advice will be warmly received. Thanks guys!