[hibernate-dev] Hibernate Search and massive indexing

Emmanuel Bernard Thu, 07 May 2009 08:24:57 -0700

I have synced with Sanne on his work on massive reindexing and here isthe outcome of the discussion.

1. An exclusive batch mode is a mode where a node has exclusive accessto the index and can optimize writings (not flushing, not committingat specific times etc).

2. The node able to activate the exclusive batch has to be the masterin a cluster (ie not the slaves).

3. The master will have two modes, a transactional mode (as today, iecommit at tx boundaries - potentially asyned) and an exclusive batchmode called the adaptative mode.In this mode the BackendQueryProcessor can take some freedom in whenand how it flushes changes to the Lucene index and when and how itcommits.One approach would be to be transactional (ie one queue of changes =one commit) for low thresholds and batch exclusive for higherthreshold (apply several queues of changes before flushing or committingthis back end would somehow communicate with the master copy processto only copy changes at the right time. (I think it should work wellalready but needs to be verified).A slave / client could force a commit by sending a Commit LuceneWorkif needed.

4. There should be a way to switch at runtime from the tx mode to theadaptative mode. When switching, the tx queue is forked, new elementsare queued, old elements are processed.

When the old queue is emptied, the adaptative mode kicks in.

5. On top of that, the massive indexer API, reads data from thedatabase as fast as possible and push index works to the adaptativeengine. This API will be mono server but multi thread for now. Sannecan describe that more in details.This api woudl have a start and waitTillDone() API that starts theadaptative engine and stops it.


That's it for the first step.

Second steps (no particular order)
6. Make the massive indexer API work in a cluster.
Slaves would read the DB and push index works to the queue

7. find a way to apply analyzing before the actual IndexWriter usage

That would allow to increase the index parallelism by allowing somepipelining.Or even better to analyze on the slaves and free cpu time for themaster (would work nice with 6)


Sanne please add anything I have missed, misinterpreted.
_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

[hibernate-dev] Hibernate Search and massive indexing

Reply via email to