Re: Basic Named Entity Indexing

2008-01-09 Thread chris.b
solved it... i was using token.toString() instead of token.termText(); thanks for the help :) -- View this message in context: http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14715727.html Sent from the Lucene - Java Users mailing list archive at Nabble.com

Re: Basic Named Entity Indexing

2008-01-09 Thread chris.b
rapper and NGramFilter which I am referring to, as well as my own NamedEntityAnalyzer/TokenFilter, which may help you understand better. http://www.nabble.com/file/p14712313/rem.rar rem.rar -- View this message in context: http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14712313.html

Re: Basic Named Entity Indexing

2008-01-08 Thread Doron Cohen
On Jan 8, 2008 11:48 PM, chris.b <[EMAIL PROTECTED]> wrote: > > Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams > and > the ngrams that i indicate, while maintining the whitespaces. :) > The reason i'm doing this is because I only wish to index names with more > than one t

Re: Basic Named Entity Indexing

2008-01-08 Thread chris.b
Basic-Named-Entity-Indexing-tp14291880p14699672.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Basic Named Entity Indexing

2008-01-08 Thread Doron Cohen
Hi Chris, A null pointer exception can be causes by not checking newToken for null after this line: Token newToken = input.next() I think Hoss meant to call next() on the input as long as returned tokens do not satisfy the check for being a named entity. Also, this code assumes white space i

Re: Basic Named Entity Indexing

2008-01-08 Thread chris.b
dexer.main(Indexer.java:81) -- am I forgetting something or am I going the wrong way? :| -- View this message in context: http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14691223.html Sent from the Lucene - Java Users mailing

Re: Basic Named Entity Indexing

2007-12-14 Thread Chris Hostetter
: a) index the documents by wrapping the whitespace analyzer with : ngramanalyzerwrapper and then retrieving only the words which have 3 or more : characters and start with a capital, filtering the "garbage" manually. : b) creating my own analyzer which will only index ngrams that start with : cap

Basic Named Entity Indexing

2007-12-12 Thread chris.b
help :s) Thanks in advance, Chris -- View this message in context: http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14291880.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscr