Christian, You can certainly purge old documents on a daily basis in order to keep the corpus from growing, but note that 3M*90=270M 2K docs may be a bit too much for a single index unless you really have lots of RAM or you don't need queries to be quick. In other words, you may have to spread this over multiple indices/machines.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Christian Brennsteiner <eingf...@yahoo.de> > To: java-user@lucene.apache.org > Sent: Friday, December 19, 2008 6:22:40 AM > Subject: lucene suiteable ? 6 mio recods / day 1k > > hi *, > > i am searching for a fulltext index capeable of the following requirements: > > index everyday 3 000 000 new records with a validity of N days (e.g. > 90 days expiration) > == 34,7 / s > one record is e.g. an url and can be up to 2 k big > > http://example.com/somedir/some.html > > lucene should use "/" as a word seperator and should e.g. eliminate all ":" > > so the following "sentence" shoule be indexed: > > http example.com somedir some.html when having the url > http://example.com/somedir/some.html > > my main concern about this requirement is that the index should not > grow over time as it always holds > NR OF DAYS * RECORDS PER DAY and expires the records after a given > time. in my opinione ther must be some background thread always > throwing away expired hits. > > is this easilly possible with lucene? > > regards chris > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org