Re: 30 milllion+ docs on a single server

2006-08-15 Thread Mark Miller
Thanks for all of the useful info on this topic. You have been very enlightening. My RAM requirements where obviously off the mark. Here is my current understanding of this issue: A standard 32-bit processor has access to 4GB of RAM. If your CPU supports Physical Address Extension (PAE) the OS

RE: 30 milllion+ docs on a single server

2006-08-14 Thread Dejan Nenov
able price. -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Friday, August 11, 2006 4:23 PM To: java-user@lucene.apache.org Subject: Re: 30 milllion+ docs on a single server Tomi NA wrote: > On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote: >> I'

Re: 30 milllion+ docs on a single server

2006-08-13 Thread Jeff Rodenburg
On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote: The single server is important because I think it will take a lot of work to scale it to multiple servers. The index must allow for close to real-time updates and additions. It must also remain searchable at all times (other than than during the

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Otis Gospodnetic
server with more RAM that allowed larger Java heaps, and try to fit your index into RAM. Otis - Original Message From: Mark Miller <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Saturday, August 12, 2006 7:45:15 PM Subject: Re: 30 milllion+ docs on a single server The

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Chris Hostetter
: Frustrated is the word :) I have looked at Solr...what I am worried : about there is this: Solr says it requires an OS that supports hard : links. Currently Windows does not to my knowledge. Someone seemed to : make a comment that Windows could be supported...from what I know I : don't think so.

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Mark Miller
The single server is important because I think it will take a lot of work to scale it to multiple servers. The index must allow for close to real-time updates and additions. It must also remain searchable at all times (other than than during the brief period of single updates and additions). If

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Jeff Rodenburg
Why is a single server so important? I can scale horizontally much cheaper than I scale vertically. On 8/11/06, Mark Miller <[EMAIL PROTECTED]> wrote: I've made a nice little archive application with lucene. I made it to handle our largest need: 2.5 million docs or so on a single server. Now

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Mark Miller
Frustrated is the word :) I have looked at Solr...what I am worried about there is this: Solr says it requires an OS that supports hard links. Currently Windows does not to my knowledge. Someone seemed to make a comment that Windows could be supported...from what I know I don't think so. Not a

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Ray Tsang
i've indexed 80m records and now up to 200m.. it can be done, and could've been done better. like the other said, architecture is important. have you considered looking into solr? i haven't kept up with it (and many of the mailing lists...), but looks very interesting. ray, On 8/12/06, Jason

Re: 30 milllion+ docs on a single server

2006-08-12 Thread Jason Polites
Sounds like you're a bit frustrated. Cheer up, the simple fact is that engineering and business rarely see eye-to-eye. Just focus on the fact that what you have learnt from the process will help you, and they paid for it ;) On the issue at hand...Lucene should scale to this level, but you need

Re: 30 milllion+ docs on a single server

2006-08-11 Thread Mark Miller
Tomi NA wrote: On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote: I've made a nice little archive application with lucene. I made it to handle our largest need: 2.5 million docs or so on a single server. Now the powers that be say: lets use it for a 30+ million document archive on a single serve

Re: 30 milllion+ docs on a single server

2006-08-11 Thread Tomi NA
On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote: I've made a nice little archive application with lucene. I made it to handle our largest need: 2.5 million docs or so on a single server. Now the powers that be say: lets use it for a 30+ million document archive on a single server! (each doc siz