solved it... i was using token.toString() instead of token.termText();
thanks for the help :)
--
View this message in context:
http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14715727.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
taking your example (text by John Bear, old.), the NGramAnalyzerWrapper
creates the following tokens:
text
text by
by
by John
John
John Bear,
Bear,
Bear, old.
I have managed to get rid of the error, but now it just doesn't add anything
to the index :s
I'm attaching the NGramAnalyzerWrapper and NG
On Jan 8, 2008 11:48 PM, chris.b <[EMAIL PROTECTED]> wrote:
>
> Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams
> and
> the ngrams that i indicate, while maintining the whitespaces. :)
> The reason i'm doing this is because I only wish to index names with more
> than one t
Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams and
the ngrams that i indicate, while maintining the whitespaces. :)
The reason i'm doing this is because I only wish to index names with more
than one token.
--
View this message in context:
http://www.nabble.com/Basic-Nam
Hi Chris,
A null pointer exception can be causes by not checking
newToken for null after this line:
Token newToken = input.next()
I think Hoss meant to call next() on the input as long as returned
tokens do not satisfy the check for being a named entity.
Also, this code assumes white space i
Following your suggestion (I think), I built a tokenfilter with the following
code for next():
public final Token next() throws IOException {
Token newToken = input.next();
termText = newToken.termText();
Character tempChar = termText.charAt
: a) index the documents by wrapping the whitespace analyzer with
: ngramanalyzerwrapper and then retrieving only the words which have 3 or more
: characters and start with a capital, filtering the "garbage" manually.
: b) creating my own analyzer which will only index ngrams that start with
: cap