solved it... i was using token.toString() instead of token.termText();
thanks for the help :)
--
View this message in context:
http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14715727.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com
rapper and NGramFilter which I am referring
to, as well as my own NamedEntityAnalyzer/TokenFilter, which may help you
understand better.
http://www.nabble.com/file/p14712313/rem.rar rem.rar
--
View this message in context:
http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14712313.html
On Jan 8, 2008 11:48 PM, chris.b <[EMAIL PROTECTED]> wrote:
>
> Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams
> and
> the ngrams that i indicate, while maintining the whitespaces. :)
> The reason i'm doing this is because I only wish to index names with more
> than one t
Basic-Named-Entity-Indexing-tp14291880p14699672.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Chris,
A null pointer exception can be causes by not checking
newToken for null after this line:
Token newToken = input.next()
I think Hoss meant to call next() on the input as long as returned
tokens do not satisfy the check for being a named entity.
Also, this code assumes white space i
dexer.main(Indexer.java:81)
--
am I forgetting something or am I going the wrong way? :|
--
View this message in context:
http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14691223.html
Sent from the Lucene - Java Users mailing
: a) index the documents by wrapping the whitespace analyzer with
: ngramanalyzerwrapper and then retrieving only the words which have 3 or more
: characters and start with a capital, filtering the "garbage" manually.
: b) creating my own analyzer which will only index ngrams that start with
: cap
help :s)
Thanks in advance,
Chris
--
View this message in context:
http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14291880.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
-
To unsubscr