: This is a text document written by someone. Read this and post your comments : : words that must be indexed: : text : document ... : text document : document written
typically when people talk about indexing n-grams -- they mean character wise (so they can find words with simple spelling mistakes) .. it's not relaly clear why you would need word wise n-grams, why not just search for phrases with no slop? : So, I made changes to StandardAnalyzer.java and StandardTokenizer.jj to try : and achieve this. if you really need/want an analyzer that produces those tokens, i would suggest you do it with a TokenFilter -- no reason to complicate the tokenizing process when you can leave that alone and combine the tokens instead. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]