Lucene Vector Model

Gorka Naveira Thu, 14 Dec 2006 03:51:59 -0800

Hi!
I'm working on Lucene's vector model, and it's way of scoring, and I have
some doubts.
As I think Lucene introduces terms (DocumentWriter.addPosition, using
Postings) in index with some information,
such as offset, document number and term frequency.


I would like  to  apply to each term another way of vectoring, associating
IDF (inverse document frequency),
BIDF (boolean IDF) or WIDF to each term, but it means that we have to take
all documents in order to get IDF,
and as I see Lucene introduces docs one on one, without comparison between
them.

I know Lucene uses IDF but only for searches (a posteriori) and taking just
a filtered set of document, not the whole set.

My questions are:
¿It's all I've seen correct?
¿It's possible to make the changes I need (for a non-expert in Lucene )?
¿It's something relationed made before?

Thank you in advance, Gorka Naveira

Lucene Vector Model

Reply via email to