[hibernate-dev] Re: [jbosscache-dev] JBoss Cache Lucene Directory

Emmanuel Bernard Mon, 25 May 2009 04:29:06 -0700

See some comments inline

On  May 25, 2009, at 11:53, Sanne Grinovero wrote:

Hello,
I'm forwarding this email to Emmanuel and Hibernate Search dev, as I
believe we should join the discussion.
Could we keep both dev-lists ([email protected],
[email protected] ) on CC ?

Sanne

2009/4/29 Manik Surtani <[email protected]>:
On 27 Apr 2009, at 05:18, Andrew Duckworth wrote:
Hello,
I have been working on a Lucene Directory provider based on JBossCache,my starting point was an implementation Manik had already writtenwhichpretty much worked with a few minor tweaks. Our use case was tocluster aLucene index being used with Hibernate Search in our application,with therequirements that searching needed to be fast, there was no sharedfilesystem and it was important that the index was consistent acrossthe cluster
in a relatively short time frame.
Maniks code used a token node in the cache to implement thedistributedlock. During my testing I set up multiple cache copies withmultiple threadsreading/writing to each cache copy. I was finding a lot oftransactions toacquire or release this lock were timing out, not understandingJBC well Imodified the distributed lock to use JGroupsDistrubutedLockManager. Thisworked quite well, however the time taken to acquire/release thelock (~100
ms for both) dwarfed the time to process the index update, lowering
throughput. Even using Hibernate Search with an async workerthread, therewas still a lot of contention for the single lock which seemed tolimit thescalability of the solution. I thinkl part of the problem was thatour useof HB Search generates a lot of small units of work (remove indexentry, addindex entry) and each of these UOW acquire a new IndexWriter andnew write
lock on the underlying Lucene Directory implementation.
Out of curiosity, I created an alternative implementation based ontheHibernate Search JMS clustering strategy. Inside JBoss Cache Icreated aqueue node and each slave node in the cluster creates a separatequeue
underneath where indexing work is written:

 /queue/slave1/[work0, work1, work2 ....]
           /slave2
           /slave3

etc
In each cluster member a background thread runs continuously whenit wakesup, it decides if it is the master node or not (currently checksif it isthe view coordinator, but I'm considering changing it to use alonger liveddistributed lock). If it is the master it merges the tasks fromeach slavequeue, and updates the JBCDirectory in one go, it can safely dothis withonly local VM locking. This approach means that in all the slavenodes theycan write to their queue without needing a global lock that anyother slaveor the master would be using. On the master, it can performmultiple updates
in the context of a single Lucene index writer. With a cache loader
configured, work that is written into the slave queue ispersistent, so itcan survive the master node crashing with automatic fail over to anewmaster meaning that eventually all updates should be applied tothe index.Each work element in the queue is time stamped to allow them to beprocessed
in order (requires!
time synchronisation across the cluster) by the master. For ourworkloadthe master/slave pattern seems to improve the throughput of thesystem.



Interestingly, we are working on similar directions.

Sanne has been working on a new model where the master is guaranteednot to share indexes with other writers. In this case we keep the IWopen for a long time (single lock) and makes significant improvements.

In // the new index needs to be distributed to the slaves, the currentmodel is the file copy (which avoids any lock issue) but a JGroupsversion has been discussed. Now that I think about it more, it mightmake sense to use JBoss Cache for the distribution simply by reusingthe file copy model:

 - no write lock is shared amongst nodes

- each slave has an active and a passive directory. the passive canreceive the new index data from the master while the active node isused for search. When the copy is done, active and passive switch- each master copy the index on a regular basis to the shared model(in this case the passive slave)?

I am not 100% sure it will work as we should only replicate data tothe passive node but that's a good thing to explore.

note that this approach does require much less lock that the currentJBoss Cache Directory implementation (as we use an async writingapproach).


_______________________________________________
hibernate-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/hibernate-dev

[hibernate-dev] Re: [jbosscache-dev] JBoss Cache Lucene Directory

Reply via email to