In fact I see you are ignoring all spaces between words. Maybe that's deliberate. Break it down into the smallest possible complete code sample that shows the problem and post that.
-- Ian. On Mon, Jan 14, 2013 at 11:02 AM, Ian Lea <ian....@gmail.com> wrote: > It won't be IndexWriter or IndexWriterConfig. What exactly does your > analyzer do - what is the full chain of tokenization? Are you saying > that ':)a' and ')an' are not indexed? Surely that is correct given > your input with a space after the :). And before as well so 's:)', is > also suspect. > > -- > Ian. > > > On Mon, Jan 14, 2013 at 7:42 AM, Hankyu Kim <gksr...@gmail.com> wrote: >> I'm working with Lucene 4.0 and I didn't use lucene's QueryParser, so >> setAllowLeadingWildcard() is irrelevant. >> I also realised the issue wasn't with querying, but it was indexing whihch >> left the terms with leading special character out. >> >> My goal was to do a fuzzymatch by creating a trigram index. The idea is to >> tokenize the documents into trigrams, not by words during indexing and >> searching so lucene can search for part of a word or phrase. >> >> Say the original text in the document said : "Sample text with special >> characters :) and such" >> It's tokenized into >> 'sam', 'amp', 'mpl', 'ple', 'let', 'ete', 'tex', 'ext', 'xtw', 'twi', >> 'wit', 'ith', 'ths', 'hsp', 'spe', 'pec', 'eci', 'cia', 'ial', 'alc', >> 'lch', 'cha', 'har', 'ara', 'rac', 'act', 'cte', 'ter', 'ers', 'rs:', >> 's:)', ':)a', ')an', 'and', 'nds', 'dsu', 'suc', 'uch'. >> The above is output from my tokenizer so there's nothing wrong with >> creating trigrmas. However, when I check the index with lukeall, all the >> other trigrams are indexed correctly except for the terms ':)a' and ')an'. >> Since the missing indexes are related to lucene's special characters, I >> don't think it's got to do with my custom code. >> >> I only changed analyser in the IndexFiles.java from demo to index the file. >> Honestly, I can't locate even the exact class in which the problem is >> caused. I'm only guessing IndexWriterConfig or IndexWriter is discarding >> the terms with leading special characters. >> >> I hope the above infromation helps. >> >> 2013/1/11 Ian Lea <ian....@gmail.com> >> >>> QueryParser has a setAllowLeadingWildcard() method. Could that be >>> relevant? >>> >>> What version of lucene? Can you post some simple examples of what >>> does/doesn't work? Post the smallest possible, but complete, code that >>> demonstrates the problem? >>> >>> >>> With any question that mentions a custom version of something, that >>> custom version has to be the prime suspect for any problems. >>> >>> >>> -- >>> Ian. >>> >>> >>> On Thu, Jan 10, 2013 at 12:08 PM, Hankyu Kim <gksr...@gmail.com> wrote: >>> > Hi. >>> > >>> > I've created a custom analyzer that treats special characters just like >>> any >>> > other. The index works fine all the time even when the query includes >>> > special characters, except when the special characters come to the >>> begining >>> > of the query. >>> > >>> > I'm using spanTermQuery and wildCardQuery, and they both seem to suffer >>> the >>> > same issue with queries begining with special characters. Is it a >>> > limitation of Lucene or am I missing something? >>> > >>> > Thanks >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org