Re: lucene scalability questions

Russ Thu, 04 Jan 2007 14:43:58 -0800

If you do this on windows, you might be able to replicate the indexes using 
DFS.  On linux you can probably use rsync to keep the different servers up to 
date.

If the size of the index is an issue, lustre could be used to have one volume 
that's spread over many servers.  Performance is supposed to be good with 
lustre as well.

If you want to speed up individual queries when searching a large index, you 
can probably split up the index in some way among the servers, query them all 
at the same time and then aggregate the results.  This is just an idea, but I 
believe it was mentioned in "lucene in action".

Russ
Sent wirelessly via BlackBerry from T-Mobile.  

-----Original Message-----
From: "Peter W." <[EMAIL PROTECTED]>
Date: Thu, 4 Jan 2007 14:02:00 
To:java-user@lucene.apache.org
Subject: Re: lucene scalability questions

Mark,

My understanding of Lucene is limited, but the issues
seem similar to web server farms in that it comes down to
linear scalability by adding more boxes.

This means separate machines with their own indexes.

Shared filesystems such as NFS work well in smaller environments
but experience problems with heavy load (lost mounts req. reboots).

There's no mysql-like 'replication' with masters using
binary files to update slaves. However, since the index is
file based, you can close Indexwriters and make hot copies or
perform backups for redundancy.

If you know XML, use Solr to post and retrieve documents to and from
your various Lucene indexes. It hides the complexity of remote
object brokering such as RMI.

Solr also allows you to get result sets using JSON so you could
provide distributed Lucene results to browsers as a .js widget.

While not reflecting the latest 2.0 version release the Lucene in Action
book provides good background on combining separate indexes.

Regards,

Peter W.

On Jan 4, 2007, at 7:51 AM, Mark Mei wrote:

> So this question has two parts:
>
> 1. How does Lucene scale, exactly? Do we distribute the index to  
> multiple
> servers somehow? Or is it one index, sitting on some sort of a shared
> filesystem, shared by all Lucene servers? If it's the latter, the  
> bottleneck
> will be I/O ... anyway, elaborate on scalability please, and how  
> you set it
> up
>
> 2. High availability. How would one go about making Lucene redundant?

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: lucene scalability questions

Reply via email to