On Jun 21, 2005, at 2:59 PM, [EMAIL PROTECTED] wrote:
I found a discrepancy in results for an identical search
("processing")
done with lucene and mysql. Seems like lucene is not returning results
where the search word is associated with "-"(hyphen) or
'."(period). For
example it didn't returned result for a text that contained
"processing-7-bit" and "straighforwerd.processing" but mysql did.
Is there
any settings issue or it is something unavoidable?
Thanks
Tareque
ControlDOCS
PS: In contrast to that, I previously found lucene returning some
other
results those mysql didn't. For example search phrase associated
with "'"
(apostrophe) and "_"(underscore). I am not complaining about this.
Rather
I found it preferable for my purpose.
These all boil down to your choice of analyzer. What analyzer are
you using?
As you can see below, "processing-7-bit" is tokenized quite
differently depending on the analyzer:
$ ant AnalyzerDemo
Buildfile: build.xml
[input] String to analyze: [This string will be analyzed.]
processing-7-bit
[echo] Running lia.analysis.AnalyzerDemo...
[java] Analyzing "processing-7-bit"
[java] WhitespaceAnalyzer:
[java] [processing-7-bit]
[java] SimpleAnalyzer:
[java] [processing] [bit]
[java] StopAnalyzer:
[java] [processing] [bit]
[java] StandardAnalyzer:
[java] [processing-7-bit]
If you're using the StandardAnalyzer, you are not indexing the word
"processing" at all. Grab the source code from Lucene in Action at
lucenebook.com and type "ant AnalyzerDemo" to try out the basic
analyzers.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]