The important detail here is what you mean by "single server"? A high-end server will work just fine - you want 4GB+ or RAM and the fastest disk/IO you can get; CPU speed is far less important; A nice Linux software RAID and 5+ 15K SCSI disks will get you superb performance, at a reasonable price.
-----Original Message----- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Friday, August 11, 2006 4:23 PM To: java-user@lucene.apache.org Subject: Re: 30 milllion+ docs on a single server Tomi NA wrote: > On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote: >> I've made a nice little archive application with lucene. I made it to >> handle our largest need: 2.5 million docs or so on a single server. Now >> the powers that be say: lets use it for a 30+ million document archive >> on a single server! (each doc size maybe 10k max...as small as a 1 or >> 2k) Please tell me why we are in trouble...please tell me why we are >> not. I have tested up to 2 million docs without much trouble but 30 >> million...the average search will include a sort on a field as >> well...can I search 30+ million docs with a sort? Man am I worried about >> that. Maybe the server will have 8 procs and 12 billion gigs of RAM. >> Mabye. Even still, Tomcat seems to be able to launch with a max of 1.5 >> or 1.6 gig of Ram in Windows. What do you think? 30 million+ sounds like >> too much of a load to me for a single server. Not that they care what I >> think...I only wrote the thing (man I hate my job, offer me a new one :) >> )...please...comments? >> >> Cheers, >> >> Miserable Mark > > I don't really understand what you're so worried about. Either it'll > work well with the setup you have, or it won't. It's really the size > of it. ;) > Seriously, you have a number of relatively cheap possibilities at hand > to improve search performance: storing the index on a RAID 5 disk > array will let you read the indices very fast, using multicore CPUs, > adding memory and even if all that isn't good enough, you can always > use a small cluster (say, 4 nodes) of very, very inexpensive PCs > filled with a GB of RAM. You don't have to keep them inside the > regular UPS/backup/voult-protected area as the indices can always be > rebuilt (unlike e.g. data in transactional systems) and between 4 of > them they might cost like an entry-level server. > I'll let the experts speak now. :) > > t.n.a. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > Thanks for the tip...I am not too worried...I am miserable because I live in Dilbert land, not this particular incident. Spreading to multiple servers is a possibility but one I want to avoid...I wrote this app on the side since our current product is crap...it still needs a lot of work and thinking about distributing lucene at this point is a little much...I never even have time to work on this project as it is becuase I am currently tasked with porting the crap old project to Windows. I need to do a bunch to shore up what I have. No one cares though...they think that I have done nothing (or can't understand what I have done) while at the same time they want to use what I havn't done to do what I made it for as well as this new super archive of 30 million + docs...in the end I'll be looking for a new job...still curious about lucene scaling to 30 million docs with a sort on every search though (yes I know the sort is cached...worries me too though...the sort will be on multiple and different fields depending no what the searcher wants...uggg...the size of the caches....) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]