Sorry I wrote something stupid: "3)use a size equal to the number of DirectoryProviders (this is the optimal value, could be the default and be overriden by a parameter)"
is not true, this is not related to the optimal value. please discard this option. I think the best option would be to have a separate Executor for each directory provider: otherwise it could happen that a slowly reacting index could block correct operation from others, as many queues could pileup targeting the same DP and exausting the threads, which would all stay locked and collapse to a single threaded model. This makes me think we should need to create one executor per DP, each one using just one thread: additional benefit is that no locking would be needed, we can remove all barriers in the backend (unless batch mode enables concurrent usage of the IndexWriter) -- Sanne 2008/11/20 Sanne Grinovero <[EMAIL PROTECTED]>: > Hello, > because of HSEARCH-268( optimize indexes in parallel ) but also for > other purposes, I am in need to define a new ThreadPool in Hibernate > Search's > Lucene backend. > The final effect will actually be that all changes to indexes are > going to be performed in parallel (on different indexes). > I consider this a major improvement, and is currently easy to > implement, iff we solve the following problems. > > The question is about how to size it properly, and how should the > parallel workers interact, especially regarding commit failures and > rollbacks: > > about the size > ========= > I've considered some options: > 1) "steal" the configuration setting from BatchedQueueingProcessor, > transforming that implementation in singlethreaded, > and reusing the parameter internally to the Lucene backend only (JMS > doesn't need it AFAIK). > I'm afraid this could break custom made backends configuration parsing. > > 2)add a new parameter to the environment > > 3)use a size equal to the number of DirectoryProviders (this is the > optimal value, could be the default and be overriden by a parameter). > > 4)change the contract of BackendQueueProcessorFactory: instead of > returning one Runnable it returns a list of Runnables, > so it's possible to use the existing Executor. > This needs some consideration about how different Runnables have to > "join the same TX"; The JMS implementation could return just one > Runnable, so no worry about that. > > about transactions > ============ > As you know Search is not using a two phase commit between DB and > Index, but Emmanuel has a very cool vision about that: we could add > that later. > The problem is: what to do if a Lucene index update fails (e.g. index > A is corrupted), > should we cancel the tasks going to make changes to the other indexes, B and > C? > That would be possible, but I don't think that you like that: after > all the database changes are committed already, so I should actually > make a "best effort" to update all indexes which are still working correctly. > > Another option would be to make the changes to all indexes, and then > IndexWriter.commit() them all after they are all done. > This is the opposite of the previous example, and also more complex to > implement. > I personally don't like this, but would like to hear more voices as it > is an important matter. > > I think Search should work on a "best effort" criteria for next > release: update all indexes it is able to. > In a future one we could add an option to make it "two phase" > optionally) by playing with the new > Lucene commit() capabilities, but this would only make sense if you > actually wanted to rollback > the database changes in case of an index failure. > > sharing IndexWriter in batch mode > ===================== > this is not needed for HSEARCH-268( optimize indexes in parallel ) but > is needed to get a major boost in indexing performance. > Currently the IndexWriter lifecycle is coupled to the operations done > in a transaction; (also Emmanuel reminded me > we need to release the file lock ASAP as a supported configuration is > to use two Search instances sharing the same FS-based index). > We already have the concept of "batch operation" and "transactional > operation"; the only difference is currently about > which tuning settings are applied to the IndexWriter. > My idea is to extend the semantics of "batch mode" to mean a state > which is globally affecting the way IndexWriters > are aquired and released: when in batch mode, the IndexWriter is not > closed at the end of each work queue, and the locks are not used: > the IndexWriter could be shared across different threads. This is not > transactionally safe of course, but that's why this is called > "batch mode" opposing to "transactional mode": nobody would expect > transactional behaviour. > There should be taken care to revert the status to "transaction mode" > and close the IndexWriter at the end, but this API > would make me reindex the database using the "parallel > scrollableresults" in the most efficient way, and nicely integrated. > This isn't as complicated to implement as it is to explain;-) > > Sanne > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev