Thanks Toke. Very descriptive. A few more questions about your SSD drive(s) - what is its current size - do you project any growth in your index size - if yes then how do you plan to correlate that with your hardware needs
Thanks again Eugene. -----Original Message----- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Tuesday, October 26, 2010 2:26 AM To: java-user@lucene.apache.org Subject: Re: Lucene Software/Hardware Setup Question On Tue, 2010-10-26 at 02:16 +0200, Kovnatsky, Eugene wrote: > I am trying to get some information on what enterprise hardware folks > use out there. We are using Lucene extensively. Our total catalogs > size is roughly 50GB between roughly 8 various catalogs, 2 of which > take up 60-70% of this size. That sounds a lot like our setup at the State and University Library, Denmark. We have about 9M records with an index size of 59GB, with 4,5M OAI-PMH harvested records and 2,5M bibliographic records from our Aleph-system. The rest of the records are divided among 16 different sources. > So my question is - if any of you guys have similar catalog sizes then > what kind of software/hardware do you have running, i.e. what app > servers, how many, what hardware are these app servers running on? We use a home brewed setup called Summa (open source) to handle the workflow and the searching. It uses plain Lucene with a few custom analyzers and some sorting, faceting, suggest and DidYouMean code. One index holds all the material. Currently the index is updated on one server and synced to two search-machines, but we're in the middle of moving the index updating to the servers to get faster updates. The hardware is 2 mirrored servers for fail-safe. They are running some Linux variant and have 2.5GHz quad-core Xeons CPU's with 6MB of level 2 cache and 16GB of RAM. We are not using virtualization for this. The machines uses traditional harddisks for data storage and fairly old enterprise-class SSD's for the index. To be honest, they are currently overkill - without faceting the throughput is 50-100 searches/second, including the overhead of using web-service calls. Faceting slows this somewhat, but as our traffic is something like 5-10 searches/second at prime time (guesstimating a lot here, as it is has been a year or two since I looked at the statistics), most of the time is spend on idle. Before that we used dual-core Xeons, again with 16GB of RAM and SSD's. They also performed just fine with our workload and were only replaced due to a general reorganization of the servers. Before that, we used used some older 3.1GHz single-core Xeon machines with only 1MB of level 2 cache, 32GB of slow RAM and traditional harddisks. My old 1.8GHz single-core laptop were about as fast for indexing & searching and they stand testament that a lot of RAM and GHz does not help much when the memory system is lacking. We did a lot of testing some time ago and found that out searches were mostly CPU-bound when using SSDs. We've talked with our hardware guys about building new servers in anticipation of more data and the current vision is relatively modest machines with quad-core i7, 16GB of RAM and consumer-grade SSDs (Intel or SandForce). As we have mirrored servers and since no one dies if they can't find a book at our library, using enterprise-SSDs is just a waste of money. Regards, Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org