I need to index bigrams and trigrams in a document. Here is an example: Text: This is a text document written by someone. Read this and post your comments
words that must be indexed: text document written someone read post your comments text document document written post your your comments text document written post your comments So, I made changes to StandardAnalyzer.java and StandardTokenizer.jj to try and achieve this. I increased the LOOKAHEAD option value to 4: options { LOOKAHEAD = 4; FORCE_LA_CHECK = true; . . } I made the following changes to StandardTokenizer.jj : org.apache.lucene.analysis.Token next() throws IOException : : : { if (token.kind == EOF) { return null; } else if(token.kind == ALPHANUM) { Token nextToken = token.next; if(token.next.kind ==ALPHANUM) { return new org.apache.lucene.analysis.Token(token.image+" "+nextToken.image, token.beginColumn,nextToken.endColumn, tokenImage[token.kind]); } } else { return new org.apache.lucene.analysis.Token(token.image, token.beginColumn,token.endColumn, tokenImage[token.kind]); } } That is, I am using token.next to get info about the next token. But it is returning null. What is the reason and is there a better way of doing this? -- View this message in context: http://www.nabble.com/Indexing-bigrams-and-trigrams-in-Lucene-tf2213042.html#a6129254 Sent from the Lucene - Java Users forum at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]