When you say "we've tried the whitespace analyzer", did you mean for BOTH indexing and searching? If you ony use it for one of those, you'd see results like this.
And do you use Luke? It'll let you examine your index and see what's *actually* in it. It's the first place I go when I don't get results I expect.... See: http://www.getopt.org/luke/ What about capitalization? Lucene is case-sensitive. Some of the analyzers automatically lower-case and some don't. If you're using the whitespace analyzer, I don't think you need to bother transforming the hyphen into underscore.... Hope this helps, without more context I'm not sure what else to suggest... Erick On 8/7/06, Yiqun Eddie Cao <[EMAIL PROTECTED]> wrote:
Hi, We are using lucene in a chemistry database, and we are dealing with special words containing both digits and characters in English alphabets, such as PFC-0234. To prevent lucene from cutting the word into two, we have replaced all dashes into underscores, so PFC-0234 is stored and indexed as PFC_0234 in the lucene index. However, none of them works for searches containing wildcard characters. For example, none of the following works: PFC_*, PFC*, PF*, PFC_0*, PFC_02*, but PFC-0234 works. Can anyone tell me what is wrong here? We have tried WhitespaceAnalyzer, but it's not working either. Thanks, Eddie