: Now my concern is: the more projects I add the more different fields : would come into play. I would not recreate the index from scratch as I'm : doing right now but I would only remove all documents with e.g. key : "project1" and add the new documents completely but not touching other : projects.
this should work ok, but one of the things you'll want to watch out for is the fieldNorms. Anytime you add a document with an index field, lucene by default creates a "fieldNorm" value for *every* document in your index -- even if those documents don't have that field. if you are only planning on having a handfull of document types, and only a handfull of indexed fields per doctype, then this shouldn't be a big deal -- but if you wnat to have thousands of indexed fields per doctype, or thousands of doctypes -- you may find your index size growing a lot faster then you expected. there is an option in 1.9 that lets you specify when adding an indexed field to a document that you don't want to bother storing a fieldNorm for that field -- if you do this length normalization won't be possible for queries on that field, and index time field/document boosts won't work -- but if you aren't concerned about those things, it will help keep your index size managable. : Currently I was using query boosting extensive for the headings in HTML : documents, e.g. title:(term)^8 h1:(term)^7 ... h6:(term)^2 : content:(term)^1 . I was wondering if this is actually necessary. The : number of existing h1 to h6 fields with content decreases with the : amount of documents. To give the fields title and h1, which are the most : used ones anyway, the highest importance, to I need the boost factor : here anyway or can I avoid them? you should try some queries like "title:term content:term" and look at the explain output on your matches to see how much of an impact on the final score the various matches on title vs content have ... if there are a lot less terms indexed in the title field then in the content field you should see the match on title be more significant, and then you can decide how much boost you want to give if it's not significant enough. my question do you would be: why do you wnat to avoid using query time boosts? there's really no harm in using them, under the coveres an implicit boost of 1.0f is used for every Query class (that i can think of) so specifying your own boost value doesn't really affect the performance of the query if that's what you are concerned about. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]