On Jun 21, 2005, at 2:59 PM, [EMAIL PROTECTED] wrote:

I found a discrepancy in results for an identical search ("processing")
done with lucene and mysql. Seems like lucene is not returning results
where the search word is associated with "-"(hyphen) or '."(period). For
example it didn't returned result for a text that contained
"processing-7-bit" and "straighforwerd.processing" but mysql did. Is there
any settings issue or it is something unavoidable?

Thanks
Tareque
ControlDOCS

PS: In contrast to that, I previously found lucene returning some other results those mysql didn't. For example search phrase associated with "'" (apostrophe) and "_"(underscore). I am not complaining about this. Rather
I found it preferable for my purpose.

These all boil down to your choice of analyzer. What analyzer are you using?

As you can see below, "processing-7-bit" is tokenized quite differently depending on the analyzer:

$ ant AnalyzerDemo
Buildfile: build.xml

    [input] String to analyze: [This string will be analyzed.]
processing-7-bit
     [echo] Running lia.analysis.AnalyzerDemo...
     [java] Analyzing "processing-7-bit"
     [java]   WhitespaceAnalyzer:
     [java]     [processing-7-bit]

     [java]   SimpleAnalyzer:
     [java]     [processing] [bit]

     [java]   StopAnalyzer:
     [java]     [processing] [bit]

     [java]   StandardAnalyzer:
     [java]     [processing-7-bit]

If you're using the StandardAnalyzer, you are not indexing the word "processing" at all. Grab the source code from Lucene in Action at lucenebook.com and type "ant AnalyzerDemo" to try out the basic analyzers.

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to