Re: Basic Named Entity Indexing

2008-01-09 Thread chris.b
solved it... i was using token.toString() instead of token.termText(); thanks for the help :) -- View this message in context: http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14715727.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: Basic Named Entity Indexing

2008-01-09 Thread chris.b
taking your example (text by John Bear, old.), the NGramAnalyzerWrapper creates the following tokens: text text by by by John John John Bear, Bear, Bear, old. I have managed to get rid of the error, but now it just doesn't add anything to the index :s I'm attaching the NGramAnalyzerWrapper and NG

Re: Basic Named Entity Indexing

2008-01-08 Thread Doron Cohen
On Jan 8, 2008 11:48 PM, chris.b <[EMAIL PROTECTED]> wrote: > > Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams > and > the ngrams that i indicate, while maintining the whitespaces. :) > The reason i'm doing this is because I only wish to index names with more > than one t

Re: Basic Named Entity Indexing

2008-01-08 Thread chris.b
Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams and the ngrams that i indicate, while maintining the whitespaces. :) The reason i'm doing this is because I only wish to index names with more than one token. -- View this message in context: http://www.nabble.com/Basic-Nam

Re: Basic Named Entity Indexing

2008-01-08 Thread Doron Cohen
Hi Chris, A null pointer exception can be causes by not checking newToken for null after this line: Token newToken = input.next() I think Hoss meant to call next() on the input as long as returned tokens do not satisfy the check for being a named entity. Also, this code assumes white space i

Re: Basic Named Entity Indexing

2008-01-08 Thread chris.b
Following your suggestion (I think), I built a tokenfilter with the following code for next(): public final Token next() throws IOException { Token newToken = input.next(); termText = newToken.termText(); Character tempChar = termText.charAt

Re: Basic Named Entity Indexing

2007-12-14 Thread Chris Hostetter
: a) index the documents by wrapping the whitespace analyzer with : ngramanalyzerwrapper and then retrieving only the words which have 3 or more : characters and start with a capital, filtering the "garbage" manually. : b) creating my own analyzer which will only index ngrams that start with : cap