Re: Re: Re: lucene suiteable ? 6 mio recods / day 1k

2008-12-21 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: lucene suiteable ? 6 mio recods / day 1k

2008-12-21 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: lucene suiteable ? 6 mio recods / day 1k

2008-12-21 Thread Christian Brennsteiner
hi otis, i think that out of 2 k 80 % can be stemmed and many of the words are duplicates so they would not need full space. can you give me an idea what in your opinion would mean "don't need queries to be quick" ... i have no idea in what timeframe it could be handeled if it is not completely i

Re: lucene suiteable ? 6 mio recods / day 1k

2008-12-21 Thread Otis Gospodnetic
Christian, You can certainly purge old documents on a daily basis in order to keep the corpus from growing, but note that 3M*90=270M 2K docs may be a bit too much for a single index unless you really have lots of RAM or you don't need queries to be quick. In other words, you may have to spread

Re: Url Analyzer

2008-12-21 Thread Otis Gospodnetic
Mark, This is simple enough that it should be easy to put together. If you search the ML archives you'll see that one of the common "tricks" is to "flip" host name parts (e.g. com.sematext.www). The details of this have been discussed before, so have a look. Otis -- Sematext -- http://semat

Re: BooleanQuery Performance Help

2008-12-21 Thread Prafulla Kiran
Hi, Here's the code which I am using to time the query: long startTime = System.currentTimeMillis(); TopDocCollector collector = new TopDocCollector(10); is.search(query,collector); ScoreDoc[] hits = collector.topDocs().scoreDocs; long endTime = System.currentTimeMillis(); Most of the clauses w

Re: Default and optimal use of RAMDirectory

2008-12-21 Thread Otis Gospodnetic
Let me add to that that I clearly recall having a hard time getting the tests for that particular section of LIA1 to clearly and consistently show that using the RAMDirectory buffering approach instead of vanilla IndexWriter yields faster indexing. Even back then IndexWriter buffered indexed da

Re: Re: Re: Inquiry on Lucene Stemming

2008-12-21 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Re: Inquiry on Lucene Stemming

2008-12-21 Thread tom
AUTOMATIC REPLY LUX is closed until 5th January 2009 - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Inquiry on Lucene Stemming

2008-12-21 Thread Otis Gospodnetic
If Hoss is referring to synonym expansion, allow me to point out that freely downloadable code from Lucene in Action (first edition) has code for that, if you'd like to have a look, OP. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Chri