Hi all, I've been doing a lot of work with Hibernate Search recently and have been pushing the lucene search side of it pretty hard. I've made several changes to improve performance and functionality and have created patches for these locally, but would like some feedback on my approach:
1) Similarity annotation (HSEARCH-136) I've created a patch for this which does the following: * Added a Similarity annotation which can be added at the class level. * Modified Workspace.java to set Similarities when creating an IndexWriter * In CacheableMultiReader changed visibility of subReaders to package private so other classes can use them * In ReaderProviderHelper added methods which can resolve underlying IndexReaders from a Searcher or Reader passed in * In DocumentBuilder added code to find the similarity annotation and store its implementation locally * In FullTextQueryImpl added code which works out which similarity to use when creating a Reader and changed all finally instances to use a common piece of code to close the reader. This seems to work well in my dev environment, and I'll be sending out a patch for 3.0.1 later today as I've already had some feedback from Emmanuel on this one. The solution is a lot simpler than the first patch I uploaded to Jira. 2) Explaining results This uses the new DOCUMENT_ID projection introduced in 3.0.1 to explain query results (we need this so the customer can understand their search results in the backoffice interface). I added an explain method to both implementations of FullTextQueryImpl which is only available by casting (e.g. no interface changes). I think explain() is probably a fairly advanced function which it's acceptable to access by casting. 3) Counting results In the current implementation we only want to perform one Lucene query per search (all projected). In order to get a resultcount and the results themselves it is currently necessary to invoke the Lucene query twice. I have made changes to allow this information to propagate through to the user whilst only making one search invokation, which has obvious effects on performance: * Created a class called SearchResultList which extends ArrayList. This has an extra property for setting and retrieving the total hitcount. * Added the method "List load(int hitCount, EntityInfo ... entityInfos)" onto the Loader interface. Keep the existing "List load(EntityInfo ... entityInfos)" method which can be stubbed by passing a dummy value if used. * Changed the loader implementations themselves to create a SearchResultList with the hitcount instead of an ArrayList. There are two ways of then accessing the hitcount: a) Casting the list to a SearchResultList b) Returning a SearchResultList instead of a List from the Loader interface and propagating this through c) Creating an interface from SearchResultList that extends List and then having a private implementation but doing as per (2) I'm particularly interested in some feedback on this as it's a big performance gain for applications that need the total hit count, and contains the most breaking changes of any of the things I've done. 4) Caching filter BitSets In order to fix the problem with readers there's going to need to be a way of accessing the underlying readers of a CacheableMultiReader in order to store the appropriate references to cache by. I think it's going to be better to either make the subReaders property public or to define an accessor for it. I've done this locally so I can hack up a working caching strategy based on a weakreference to the first reader, which works for my case but not the general case. Any feedback on these would be very useful. I've made the changes locally, but would like some confirmation about direction before I start spraying patches around. Cheers, Nick _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev