from:"chris.b"

Problem indexing Word Documents

2007-11-26 Thread chris.b

okay, so i'm very new to lucene, so it may be my bad, but i can get it to index .txt files, and when trying to index word documents (using poi), the program starts running and when it reaches a .doc file, i get the following errors: Exception in thread "main" org.apache.poi.hpsf.IllegalPropertySe

Problem with termdocs.freq and other

2007-12-10 Thread chris.b

Here goes, I'm developing an application using lucene which will evaluate the representativeness of a list of keywords within a collection of documents. I'm doing this by indexing the documents and then, loading the list of keywords and using the IndexReader Class and DefaultSimilarity, retrieving

Re: Problem with termdocs.freq and other

2007-12-10 Thread chris.b

teration, > so every second is skipped... ? > > "chris.b" <[EMAIL PROTECTED]> wrote on 10/12/2007 12:58:15: > >> >> Here goes, >> I'm developing an application using lucene which will evaluate the >> representativeness of a list of keywords w

Basic Named Entity Indexing

2007-12-12 Thread chris.b

I'm not even sure if it can be considered Named Entity Recognition, but what the hell... so here's my problem... I was asked to retrieve a the named entities out of a collection of documents, and I've thought of two ways of doing so (not sure if either of them work)... a) index the documents by w

Question regarding adding documents

2008-01-06 Thread chris.b

is it possible to add a document to an index and, while doing so, get the terms in that document? If so, how would one do this? :x thanks :) -- View this message in context: http://www.nabble.com/Question-regarding-adding-documents-tp14656336p14656336.html Sent from the Lucene - Java Users mail

Re: Basic Named Entity Indexing

2008-01-08 Thread chris.b

Following your suggestion (I think), I built a tokenfilter with the following code for next(): public final Token next() throws IOException { Token newToken = input.next(); termText = newToken.termText(); Character tempChar = termText.charAt

Re: Basic Named Entity Indexing

2008-01-08 Thread chris.b

Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams and the ngrams that i indicate, while maintining the whitespaces. :) The reason i'm doing this is because I only wish to index names with more than one token. -- View this message in context: http://www.nabble.com/Basic-Nam

Re: Basic Named Entity Indexing

2008-01-09 Thread chris.b

taking your example (text by John Bear, old.), the NGramAnalyzerWrapper creates the following tokens: text text by by by John John John Bear, Bear, Bear, old. I have managed to get rid of the error, but now it just doesn't add anything to the index :s I'm attaching the NGramAnalyzerWrapper and NG

Re: Basic Named Entity Indexing

2008-01-09 Thread chris.b

solved it... i was using token.toString() instead of token.termText(); thanks for the help :) -- View this message in context: http://www.nabble.com/Basic-Named-Entity-Indexing-tp14291880p14715727.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Retrieve number of terms

2008-01-10 Thread chris.b

I'm sure this has been asked a few times before, but i searched and searched and found no answer (apart from using luke), but I would like to know if there's a way of retrieving the number of terms in an index. I tried cycling through a TermEnum, but i doesn't do anything :| -- View this message

Problem indexing Word Documents

Problem with termdocs.freq and other

Re: Problem with termdocs.freq and other

Basic Named Entity Indexing

Question regarding adding documents

Re: Basic Named Entity Indexing

Re: Basic Named Entity Indexing

Re: Basic Named Entity Indexing

Re: Basic Named Entity Indexing

Retrieve number of terms

10 matches

Site Navigation

Mail list logo

Footer information