On Wed, 2010-07-14 at 20:28 +0200, Christopher Condit wrote: [Toke: No frequent updates]
> Correct - in fact there are no updates and no deletions. We index > everything offline when necessary and just swap the new index in... So everything is rebuild from scratch each time? Or do you mean that you're only adding new documents, not changing old ones? Either way, optimizing to a single 140GB segment is heavy. Ignoring the relatively light processing of the data, the I/O for merging is still at the very minimum to read and write the 140GB. Even if you can read and write 100MB/sec it still takes an hour. This is of course not that relevant if you're fine with a nightly batch job. > By more segments do you mean not call optimize() at index time? Either that or calling it with maxNumSegments 10, where 10 is just a wild guess. Your mileage will vary: http://lucene.apache.org/java/3_0_0/api/all/org/apache/lucene/index/IndexWriter.html#optimize%28int%29 [Toke: 10GB is a lot for such an index. What about sorting & faceting?] > No faceting and no sorting (other than score) for this index... [Toke: What queries?] > Typically just TermQueries or BooleanQueries: (Chip OR Nacho OR Foo) > AND (Salsa OR Sauce) AND (This OR That) > The latter is most typical. > > With a single keyword it will execute in < 1 second. In a case where > there are 10 clauses it becomes much slower (which I understand, > just looking for ways to speed it up)... As Erick Erickson recently wrote: "Since it doesn't make sense to me, that must mean I don't understand the problem very thoroughly". Your queries seems simple enough and I would expect response times well under a second with warmed index and conventional local harddrives. Together with the unexpected high memory requirement my guess is that there's something going on with your terms. If you try opening the index with luke, it'll tell you the number of terms. If that is very high for the fields you search on, this would explain the memory usage. You can also take a look at the rank for the most common terms. If it is very high this would explain the long execution times for compound queries that uses one or more of these terms. A stopword filter would help in this case if such a filter is acceptable for you. Regards, Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org