Sounds like you're a bit frustrated. Cheer up, the simple fact is that
engineering and business rarely see eye-to-eye. Just focus on the fact
that
what you have learnt from the process will help you, and they paid
for it
;)
On the issue at hand...Lucene should scale to this level, but you need a
good architecture behind it. Google has good indexing tech, but it's
their
architecture that allows them to spread the index across thousands of
servers which really gives it grunt (to the point that they invented
their
own RAID-style file system).
Just think very carefully about the architecture underpinning the index.
Lucene is core-tech. It's up to you to provide the framework to make it
hum.
On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote:
>
> Tomi NA wrote:
> > On 8/12/06, Mark Miller <[EMAIL PROTECTED]> wrote:
> >> I've made a nice little archive application with lucene. I made
it to
> >> handle our largest need: 2.5 million docs or so on a single server.
Now
> >> the powers that be say: lets use it for a 30+ million document
archive
> >> on a single server! (each doc size maybe 10k max...as small as a
1 or
> >> 2k) Please tell me why we are in trouble...please tell me why we
are
> >> not. I have tested up to 2 million docs without much trouble but 30
> >> million...the average search will include a sort on a field as
> >> well...can I search 30+ million docs with a sort? Man am I worried
> about
> >> that. Maybe the server will have 8 procs and 12 billion gigs of
RAM.
> >> Mabye. Even still, Tomcat seems to be able to launch with a max of
1.5
> >> or 1.6 gig of Ram in Windows. What do you think? 30 million+ sounds
> like
> >> too much of a load to me for a single server. Not that they care
what
I
> >> think...I only wrote the thing (man I hate my job, offer me a
new one
> :)
> >> )...please...comments?
> >>
> >> Cheers,
> >>
> >> Miserable Mark
> >
> > I don't really understand what you're so worried about. Either it'll
> > work well with the setup you have, or it won't. It's really the size
> > of it. ;)
> > Seriously, you have a number of relatively cheap possibilities at
hand
> > to improve search performance: storing the index on a RAID 5 disk
> > array will let you read the indices very fast, using multicore CPUs,
> > adding memory and even if all that isn't good enough, you can always
> > use a small cluster (say, 4 nodes) of very, very inexpensive PCs
> > filled with a GB of RAM. You don't have to keep them inside the
> > regular UPS/backup/voult-protected area as the indices can always be
> > rebuilt (unlike e.g. data in transactional systems) and between 4 of
> > them they might cost like an entry-level server.
> > I'll let the experts speak now. :)
> >
> > t.n.a.
> >
> >
---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
> Thanks for the tip...I am not too worried...I am miserable because I
> live in Dilbert land, not this particular incident. Spreading to
> multiple servers is a possibility but one I want to avoid...I wrote
this
> app on the side since our current product is crap...it still needs
a lot
> of work and thinking about distributing lucene at this point is a
little
> much...I never even have time to work on this project as it is
becuase I
> am currently tasked with porting the crap old project to Windows. I
need
> to do a bunch to shore up what I have. No one cares though...they
think
> that I have done nothing (or can't understand what I have done)
while at
> the same time they want to use what I havn't done to do what I made it
> for as well as this new super archive of 30 million + docs...in the
end
> I'll be looking for a new job...still curious about lucene scaling
to 30
> million docs with a sort on every search though (yes I know the
sort is
> cached...worries me too though...the sort will be on multiple and
> different fields depending no what the searcher wants...uggg...the
size
> of the caches....)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>