I'm sure this has been asked a few times before, but i searched and searched
and found no answer (apart from using luke), but I would like to know if
there's a way of retrieving the number of terms in an index.
I tried cycling through a TermEnum, but i doesn't do anything :|
--
View this message
solved it... i was using token.toString() instead of token.termText();
thanks for the help :)
--
View this message in context:
http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14715727.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
taking your example (text by John Bear, old.), the NGramAnalyzerWrapper
creates the following tokens:
text
text by
by
by John
John
John Bear,
Bear,
Bear, old.
I have managed to get rid of the error, but now it just doesn't add anything
to the index :s
I'm attaching the NGramAnalyzerWrapper and NG
Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams and
the ngrams that i indicate, while maintining the whitespaces. :)
The reason i'm doing this is because I only wish to index names with more
than one token.
--
View this message in context:
http://www.nabble.com/Basic-Nam
Following your suggestion (I think), I built a tokenfilter with the following
code for next():
public final Token next() throws IOException {
Token newToken = input.next();
termText = newToken.termText();
Character tempChar = termText.charAt
is it possible to add a document to an index and, while doing so, get the
terms in that document? If so, how would one do this? :x
thanks :)
--
View this message in context:
http://www.nabble.com/Question-regarding-adding-documents-tp14656336p14656336.html
Sent from the Lucene - Java Users mail
I'm not even sure if it can be considered Named Entity Recognition, but what
the hell...
so here's my problem...
I was asked to retrieve a the named entities out of a collection of
documents, and I've thought of two ways of doing so (not sure if either of
them work)...
a) index the documents by w
teration,
> so every second is skipped... ?
>
> "chris.b" <[EMAIL PROTECTED]> wrote on 10/12/2007 12:58:15:
>
>>
>> Here goes,
>> I'm developing an application using lucene which will evaluate the
>> representativeness of a list of keywords w
Here goes,
I'm developing an application using lucene which will evaluate the
representativeness of a list of keywords within a collection of documents.
I'm doing this by indexing the documents and then, loading the list of
keywords and using the IndexReader Class and DefaultSimilarity, retrieving
okay, so i'm very new to lucene, so it may be my bad, but i can get it to
index .txt files, and when trying to index word documents (using poi), the
program starts running and when it reaches a .doc file, i get the following
errors:
Exception in thread "main"
org.apache.poi.hpsf.IllegalPropertySe
10 matches
Mail list logo