Hi all, Yesterday I had a discussion with Sanne on irc [3] about the new api to access index readers in HS4.0. We couldn't complete our discussion yesterday, so let's continue here. As explained in the forum [1], there is currently no good solution for getting a reader with a subset of the indexes in a sharded environment.
Currently two basic ideas came to mind: A - Have a SearchFactory.openIndexReader(Class<?> c, FullTextFilterImplementor...): This is similar to how the IndexManager's are gathered at query time, and is probably therefore easy to understand B - (to be further reviewed) Have something like searchFactory.indexReaders().withShardingOptions( X, Y ).includeType(Class<?> z).openIndexReader(). This also adds the ability to get an IndexReader for multiple classes. But we need to think about the .withShardingOptions (or something similar), what input should we support here? Sharding properties are mostly based on some entity property(/ies), probably easy to be encode as String. The (custom) sharding strategy may use such String to select the proper index managers. Using a String object for identifying which index managers to use looks fine to me. It will be compatible with current implementation of custom sharding strategies where one might use the Lucene document at addition time, or if an entity instance will also be passed (see discussion [2]), the properties of that entity can probably encoded to some String. And if HS will cover the mapping/have support for Strings as identifiers for sharding instead of a user defined mapping of the index (integer) in the array of IndexManagers, that would be awesome :) (Relieves the pain of having some mapping that should be stored somewhere, which I currently do). Still, we need to know the use cases there might be, i.e. which flexibility the API should offer. As is also mentioned in [1], there is currently no direct access to the index managers, so getting a FSDirectory is currently not possible in 4.0alpha1. I think HS should support this to offer the flexibility to work on the Lucene indexes directly (for example, to build an auto completion/spell check index from an existing index) Let's start by setting up some requirements? --------- *1 Have access to IndexReader for one class *2 Have access to IndexReader with a subset of IndexManagers based on sharding strategy. Sharding strategies are mostly based on some propert(y/ies) of an entity instance, which can likely be encoded to some String. *3 Have access to index directories (FSDirectory/...). Unlike previous versions (< HS4.0) it would be nice if this uses the ShardingStrategy instance in use, so mapping is completely and exclusively done in a ShardingStrategy * ... --------- Please extend/modify the list of requirements if you think something is missing/incorrect and drop your ideas/thoughts about the mentioned ideas. Elmer [1] https://forum.hibernate.org/viewtopic.php?p=2448000#p2448000 [2] http://www.mailinglistarchive.com/html/hibernate-dev@lists.jboss.org/2011-08/msg00091.html [3] IRC log: <elmervc> sannegrinovero, have you read/did you have time to think about https://forum.hibernate.org/viewtopic.php?p=2448000#p2448000 <sannegrinovero> hi elmervc , yes I've read it. my next thing on the todo is to make some prototype, as I'm not happy with the current ideas: <sannegrinovero> elmervc, are you blocked by this? the workaround is very simple <sannegrinovero> generally, I'm wondering if we can avoid having to expose the DirectoryProviders. I would want them gone from the public API, but of course limitations like this are not acceptable. <elmervc> sannegrinovero, I'm branching this migration, so it's not really blocking. But I would like to try the new H core/search, so for that to work I need access to the subset of indices <elmervc> What workaround were you thinking about ? <elmervc> Just construct an index reader/FSDirs myself using 'hardcoded' paths ? <sannegrinovero> nono that's ugly.. <sannegrinovero> elmervc, all logic to open this IR is in org.hibernate.search.impl.ImmutableSearchFactory.openIndexReader(Class<?>...) <sannegrinovero> elmervc, and it's just a couple of lines to change ;) <sannegrinovero> the problem is more how to make it easy to consume <elmervc> Ok, I'll look into that :) <elmervc> Using filters is not a good idea? <sannegrinovero> yes I liked your suggestion. but is it enough ? <sannegrinovero> and how would the methods look like? <sannegrinovero> (i.e. the signature) <elmervc> SearchFactory.openIndexReader(Class<?> c, FullTextFilterImplementor[] filters) , or what do you mean? <sannegrinovero> I'd prefer SearchFactory.openIndexReader(Class<?> c, FullTextFilterImplementor... filters) <elmervc> But I'm not sure if this covers all use cases of sharding <sannegrinovero> elmervc, the methods don't need necessarily be defined on the SearchFactory. We can think of something like searchFactory.indexReaders().withShardingOptions( X, Y ).includeType(Class<?> z).openIndexReader() .. how does that look like? <sannegrinovero> I'm just tossing out some ideas, but then we should bring this up to the mailing list. <elmervc> the .includeType , do you mean that multiple classes can be included? <sannegrinovero> yes <sannegrinovero> basically the indexReaders() method would open a context, private to this invocation chain only. (i.e. not affecting other threads invoking .indexReaders() ) <elmervc> Sounds cool. But then we need to think about the .withShardingOptions, or something similar. For transparancy it's best to have something similar to the methods in the ShardingStrategy interface <elmervc> Or something similar to what is done @ querytime, i.e. FullTextFilterImplementors <elmervc> The point is, we need to know what other use cases one might have <elmervc> That's related to how sharding is done, i.e. ... might be a field in the doc , full text filter, ... <elmervc> (doc = doc to be added) <sannegrinovero> yes exactly I need use cases to understand this, that's why your feedback is very much appreciated :) <elmervc> sannegrinovero, For example, our sharding strategy is based on some field in an entity that is added to the Lucene Document (actually, it has a @Field anno, and this field is removed from the Lucene Document in the shardingstrategy.getDirectoryProviderForAddition(...) <sannegrinovero> elmervc, lol that prooves another discussion I had recently in proposing that we should pass the entity instance and not the document to the sharding strategy. <elmervc> It might be usefull indeed, but in our case it's easier to use a Field in the doc, because that field will always have the same name, i.e. we can reuse the same sharding strategy. <sannegrinovero> elmervc, this discussion is very interesting but I'm busy in other chats now which I can't postpone. Could you please synthesize this and send a mail to the developer list? _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev