RE: scalability limit in terms of numbers of large documents

2010-08-16 Thread Burton-West, Tom
Hi Andy, We are currently indexing about 650,000 full-text books in per Solr/Lucene index. We have 10 shards for a total of about 6.5 million documents and our average response time is under a 2 seconds, but the slowest 1% of queries take between 5-30 seconds. If you were searching only on

Re: scalability limit in terms of numbers of large documents

2010-08-16 Thread Toke Eskildsen
On Sat, 2010-08-14 at 03:24 +0200, andynuss wrote: > Lets say that I am indexing large book documents broken into chapters. A > typical book that you buy at amazon. What would be the approximate limit to > the number of books that can be indexed slowly and searched quickly. The > search unit wou

Re: scalability limit in terms of numbers of large documents

2010-08-14 Thread Erick Erickson
Here's the Wiki: http://wiki.apache.org/solr/ You can be reading the eBook in 5 minutes, or order the physical book at: https://www.packtpub.com/solr-1-4-enterprise-search-server/book?utm_source=lucidimagination.com&utm_medium=bookrev&utm_content=other&utm_campaign=mdb_000374 Lucid Imaginations m

Re: scalability limit in terms of numbers of large documents

2010-08-14 Thread andynuss
Hi Erick, My documents are roughly a 0.5 to 1 million chars divide into normal words, and divided into 50 chapters, each chapter streamed into a docid unit. So a search hit is a chapter. How do I find out more about sharding and SOLR? Andy -- View this message in context: http://lucene.47206

Re: scalability limit in terms of numbers of large documents

2010-08-14 Thread Erick Erickson
As asked, that's really an unanswerable question. The math is pretty easy in terms of running out of document IDs, but "searched quickly" depends on too many variables. I suspect, though, that long before you ran out of document IDs, you'd need to shard your index, Have you looked at SOLR? Best E