Re: [hibernate-dev] Hibernate Search and massive indexing

Emmanuel Bernard Thu, 07 May 2009 09:02:49 -0700


On  May 7, 2009, at 17:50, Sanne Grinovero wrote:

I'd like to stress that the second mode (adaptative) is not contrained
into the BackendQueueProcessorFactory interface as
it is not created by the usual BatchedQueueingProcessor but by a new
kind of Worker.
So we can apply some more pipelining optimizations.

Well not sure. Check the MDB listening to changes in the asymetriccluster mode

http://anonsvn.jboss.org/repos/hibernate/search/trunk/src/main/java/org/hibernate/search/backend/impl/jms/AbstractJMSHibernateSearchController.java
method getWorker()

We reach the getBackendQueueProcessorFactory and skip the Worker.



About 5) the API is not going to be threadsafe/multithread but will
use multiple threads.

It's not clear to me at the moment when the new "Commit" LuceneWork
should be used.


Correct, we can delay that and see who needs that.

Sanne

2009/5/7 Emmanuel Bernard <emman...@hibernate.org>:
I have synced with Sanne on his work on massive reindexing and hereis the
outcome of the discussion.
1. An exclusive batch mode is a mode where a node has exclusiveaccess to
the index and can optimize writings (not flushing, not committing at
specific times etc).
2. The node able to activate the exclusive batch has to be themaster in a
cluster (ie not the slaves).
3. The master will have two modes, a transactional mode (as today,ie commitat tx boundaries - potentially asyned) and an exclusive batch modecalled
the adaptative mode.
In this mode the BackendQueryProcessor can take some freedom inwhen and how
it flushes changes to the Lucene index and when and how it commits.
One approach would be to be transactional (ie one queue of changes= onecommit) for low thresholds and batch exclusive for higher threshold(apply
several queues of changes before flushing or committing
this back end would somehow communicate with the master copyprocess to onlycopy changes at the right time. (I think it should work wellalready but
needs to be verified).
A slave / client could force a commit by sending a CommitLuceneWork if
needed.

4. There should be a way to switch at runtime from the tx mode to the
adaptative mode. When switching, the tx queue is forked, newelements are
queued, old elements are processed.
When the old queue is emptied, the adaptative mode kicks in.
5. On top of that, the massive indexer API, reads data from thedatabase asfast as possible and push index works to the adaptative engine.This APIwill be mono server but multi thread for now. Sanne can describethat more
in details.
This api woudl have a start and waitTillDone() API that starts the
adaptative engine and stops it.

That's it for the first step.

Second steps (no particular order)
6. Make the massive indexer API work in a cluster.
Slaves would read the DB and push index works to the queue

7. find a way to apply analyzing before the actual IndexWriter usage
That would allow to increase the index parallelism by allowing some
pipelining.
Or even better to analyze on the slaves and free cpu time for themaster
(would work nice with 6)

Sanne please add anything I have missed, misinterpreted.
_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Re: [hibernate-dev] Hibernate Search and massive indexing

Reply via email to