[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene

Shai Erera (JIRA) Thu, 02 May 2013 09:08:16 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647655#comment-13647655
 ]


Shai Erera commented on LUCENE-4975:
------------------------------------

So here's an overview how the Replicator works (it's also document under 
oal.replicator.package.html):

At a high-level, producers (e.g. indexer) publish Revisions, and consumers 
update to the latest Revision available. Like SVN, if a client is on rev1 and 
the server has rev4, the next update request will upgrade the client to rev4, 
skipping all intermediate revisions.

The Replicator offers two implementations at the moment: LocalReplicator to be 
used by at the server side and HttpReplicator to be used by clients to e.g. 
update over HTTP. In the future, we may want to add other Replicator 
implementations, e.g. rsync, torrent... for HTTP, the package also provides a 
ReplicationService which acts on the Http servlet request/response following 
some API specification. In that sense, the HttpReplicator expects a certain 
HTTP impl on the server side, so ReplicationService helps you by implementation 
that API. The reason it's not a servlet is so that you can plug it into your 
application servlet freely.

A Revision is basically a list of files and sources. For example, IndexRevision 
contains the list of files in an IndexCommit (and only one source), while 
IndexAndTaxonomyRevision contains the list of files from both IndexCommits with 
corresponding sources (index/taxonomy). When the server publishes either of 
these two revision, the IndexCommits are snapshotted so that files aren't 
deleted, and the Replicator serves file requests (by clients) from the 
Revision. The Revision is also responsible for releasing itself -- this is done 
automatically by the Replicator which releases a revision when it's no longer 
needed (i.e. there's a new one already) and there are no clients that currently 
replicate its files.

On the client side, the package offers a ReplicationClient class which can be 
invoked either manually, or start its update-thread to periodically check for 
updates. The client is given a ReplicationHandler (two matching 
implementations: IndexReplicationHandler and 
IndexAndTaxonomyReplicationHandler) which is responsible to act on the 
replicated files. The client first obtains all needed files (i.e. those that 
the new Revision offers, and the client is still missing), and after they were 
all successfully copied over, the handler is invoked. Both handlers copy the 
files from their temporary location to the index directories, fsync them and 
kiss the index such that unused files are deleted. You can provide each handler 
a Callable which is invoked after the index has been safely and successfully 
updated, so you can e.g. searcherManager.maybeReopen().

Here's a general code example that explains how to work with the Replicator:

{code}
// ++++++++++++++ SERVER SIDE ++++++++++++++ // 
IndexWriter publishWriter; // the writer used for indexing
Replicator replicator = new LocalReplicator();
replicator.publish(new IndexRevision(publishWriter));

// ++++++++++++++ CLIENT SIDE ++++++++++++++ // 
// either LocalReplictor, or HttpReplicator if client and server are on 
different nodes
Replicator replicator;

// callback invoked after handler finished handling the revision and e.g. can 
reopen the reader.
Callable&lt;Boolean&gt; callback = null; // can also be null if no callback is 
needed
ReplicationHandler handler = new IndexReplicationHandler(indexDir, callback);
SourceDirectoryFactory factory = new PerSessionDirectoryFactory(workDir);
ReplicationClient client = new ReplicationClient(replicator, handler, factory);

// invoke client manually
client.updateNow();
        
// or, periodically
client.startUpdateThread(100); // check for update every 100 milliseconds
{code}

The package of course comes with unit tests, though I'm sure there's room for 
improvement (there always is!).
                
> Add Replication module to Lucene
> --------------------------------
>
>                 Key: LUCENE-4975
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4975
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>
> I wrote a replication module which I think will be useful to Lucene users who 
> want to replicate their indexes for e.g high-availability, taking hot backups 
> etc.
> I will upload a patch soon where I'll describe in general how it works.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4975) Add Replication module to Lucene

Reply via email to