Hi! I'm working on Lucene's vector model, and it's way of scoring, and I have some doubts. As I think Lucene introduces terms (DocumentWriter.addPosition, using Postings) in index with some information, such as offset, document number and term frequency.
I would like to apply to each term another way of vectoring, associating IDF (inverse document frequency), BIDF (boolean IDF) or WIDF to each term, but it means that we have to take all documents in order to get IDF, and as I see Lucene introduces docs one on one, without comparison between them. I know Lucene uses IDF but only for searches (a posteriori) and taking just a filtered set of document, not the whole set. My questions are: ¿It's all I've seen correct? ¿It's possible to make the changes I need (for a non-expert in Lucene )? ¿It's something relationed made before? Thank you in advance, Gorka Naveira