I'd like to stress that the second mode (adaptative) is not contrained into the BackendQueueProcessorFactory interface as it is not created by the usual BatchedQueueingProcessor but by a new kind of Worker. So we can apply some more pipelining optimizations.
About 5) the API is not going to be threadsafe/multithread but will use multiple threads. It's not clear to me at the moment when the new "Commit" LuceneWork should be used. Sanne 2009/5/7 Emmanuel Bernard <emman...@hibernate.org>: > I have synced with Sanne on his work on massive reindexing and here is the > outcome of the discussion. > > 1. An exclusive batch mode is a mode where a node has exclusive access to > the index and can optimize writings (not flushing, not committing at > specific times etc). > > 2. The node able to activate the exclusive batch has to be the master in a > cluster (ie not the slaves). > > 3. The master will have two modes, a transactional mode (as today, ie commit > at tx boundaries - potentially asyned) and an exclusive batch mode called > the adaptative mode. > In this mode the BackendQueryProcessor can take some freedom in when and how > it flushes changes to the Lucene index and when and how it commits. > One approach would be to be transactional (ie one queue of changes = one > commit) for low thresholds and batch exclusive for higher threshold (apply > several queues of changes before flushing or committing > this back end would somehow communicate with the master copy process to only > copy changes at the right time. (I think it should work well already but > needs to be verified). > A slave / client could force a commit by sending a Commit LuceneWork if > needed. > > 4. There should be a way to switch at runtime from the tx mode to the > adaptative mode. When switching, the tx queue is forked, new elements are > queued, old elements are processed. > When the old queue is emptied, the adaptative mode kicks in. > > 5. On top of that, the massive indexer API, reads data from the database as > fast as possible and push index works to the adaptative engine. This API > will be mono server but multi thread for now. Sanne can describe that more > in details. > This api woudl have a start and waitTillDone() API that starts the > adaptative engine and stops it. > > That's it for the first step. > > Second steps (no particular order) > 6. Make the massive indexer API work in a cluster. > Slaves would read the DB and push index works to the queue > > 7. find a way to apply analyzing before the actual IndexWriter usage > That would allow to increase the index parallelism by allowing some > pipelining. > Or even better to analyze on the slaves and free cpu time for the master > (would work nice with 6) > > Sanne please add anything I have missed, misinterpreted. > _______________________________________________ > hibernate-dev mailing list > hibernate-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hibernate-dev > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev