Updated the wiki: http://community.jboss.org/docs/DOC-16743
I'll highlight here some previously unmentioned details: - Service instances (being discussed) * for the public extensions points sometimes it's useful to pass instances instead of mere classnames to a SearchConfiguration o Infinispan Query would love it to pass a self-built Directory factory so that Search could use that to store indexes on the same Infinispan instance (instead of starting a second cluster). - about the less-static mapping: I've thought that the tricky part is going to be matching the proper analyzer in the backend - mentioned that. - Added a "(being discussed)" flag to what I wasn't still sure was approved by Emmanuel and Hardy as I had no positive feedback yet. (I might have forgot others to cheat you..) Cheers, Sanne 2011/4/26 Sanne Grinovero <sa...@hibernate.org>: > some late answers: > > 2011/4/21 Emmanuel Bernard <emman...@hibernate.org>: >> OK, if we want to do all of this we will hate to start very quickly. In >> fairness, I'm not sure we can even do all of this so let's make sure these >> are prioritized accordingly. >> I could not find the expected deadlines for AS 7 / Cre 4 but we are probably >> talking about June here: ie very soon. >> >> Some more comments inline. >> >> On 20 avr. 2011, at 10:21, Sanne Grinovero wrote: >> >>> Hi, >>> About changing contracts, we don't get this chance very often so we >>> should make sure we don't miss any. >>> I have some favourites I'd like to discuss: >>> >>> - work list sent to backend >>> -- As you know Lucene dropped all guarantees about serializability, >>> supporting stuff like JMS requires a format change; especially the >>> NumericField is not working right now as it was never serializable >>> (HSEARCH-681) >> >> +1 >> >>> -- Lucene is being more flexible about updates, I don't think we >>> should keep remapping an "update" operation as a delete+add operation, >>> but transmit the "update operation" and let the backend figure out >>> what's best. >> >> I guess we could do that. we need to make sure collections "updates" play >> well in that mix. > > the urgent bit of the proposal is to add an "update" operation as a > supported verb. there's no need to convert the collections updates > from using a "delete+add" soon, I just mean to make it possible to > later improve on this so that the contract allows it. > >>> - DirectoryProvider >>> -- make a "DirectoryManager" instead, which is able to provide >>> factories for both IndexReader an IndexWriters >>> -- add utility methods like "getName()", wish I had that in some >>> cases to provide better error messages. This leads me to think that >>> instead of trying to foresee all needed methods, the extension point >>> should not be the DirectoryManager interface directly, but have people >>> plug in different aspects. >> >> That might be better also since it reduces the scope, it's easier to design >> the contract. >> >>> -- this is needed to support both Instantiated indexes and to make >>> good use of all new so called "Near-Real-Time" Lucene improvements. >>> >>> - ReaderProvider >>> -- (assuming should a thing would still exist): I think it would be >>> very nice if the responsibility of such a provider would be to provide >>> the IndexReader for a single index. currently it has to provide a >>> "multiReader" on each different index, making some implementations >>> very tricky (seems I got it right in SharingBufferReaderProvider, but >>> I recently had some other interesting ideas which revelaed quite >>> dounting after a draft: take responsibility of the FieldCache expiry >>> directly, to be able to plug different cache implementations, we >>> control the lifecycle and we can be much smarter). >> >> ok, we might be able to do that in a 4.1 if need be. > > right, no need to make the new FieldCache integration, but we'd need > to change the ReaderProvider API to work on a single index. > >> >>> >>> - backends and workers >>> -- I'd like to make it possible to configure different backends per >>> index. currently a backend is global, while in some cases (extreme) it >>> would have been hand to configure even single shards to different >>> backends. So really a backend should be something coupled to the >>> "DirectoryManager" mentioned before. Question is, at what level is >>> sharding going to work, it could work as a multiplexing >>> DirectoryManager. >> >> Can you remind us the use case behind heterogeneous backends. There was one >> but I forgot. > > it's mostly about performance details, the possibility to have > different entities configured with different requirements: so for > example one entity might have large indexes and use the rsync copy > algorithm via the master/slave index providers configured to synch the > index once per hour or day using async JMS as backend, while another > entity requires transactional synchronization over the cluster and so > might need an in sync JMS with the Infinispan directoryprovider. > Currently people having such a requirement need to configure > everything as synch. > There was also a case in which people wanted to use a sharding > strategy on top of this, to have some shards in high priority for the > same entity; one corner use case even wanted to have a shard policy > including a blackhole backend as one of the shards. > >> >>> >>> -- defaults to change: >>> - remove the notions of transactional / batch IndexWriter setting, >>> was deprecated since long enough. >> >> ok easy >> >>> - make the FullTextEventLister final (people still extent and replace >>> it to better control when an entity is to be indexed, but I hope we >>> can solve that as well) >> >> Well it will be in a private package anyways >> >>> - default to NumericField for numeric properties >>> - set exclusive_index_use=true by default, benefits are far too high >>> and some optimizations I was thinking of are impossible if this is >>> disabled. >> >> I'm not sure I agree with that. It seems that such a default would bite a >> non careful user too easily. > > how bite? it's not going to disable the index locking. And the > Near-Real-Time features of latest Lucene require the IndexWriter to be > always open, and this feature is so great for the way Hibernate Search > uses Lucene, it's sad that we don't support it yet. > >> >>> >>> -- bridges >>> - It happened many times that we couldn't do X or optimize Y as "user >>> bridge might read/write any field"; I think we should stop exposing >>> the o.a.lucene.Document - especially since we change the format of >>> messages to the backend - and make sure to expose something as good >>> and as flexible. Need some thinking on this: we can't expose Document >>> but we want to make sure people won't ever miss advanced features for >>> which such a bridge was a nice "advanced api". Or we split the >>> concepts, having a less-powerful API and a more advanced one, which >>> could be named, and operate on the Document itself but inside the >>> backend rather than in the DocumentBuilder (so the name could be used >>> in the message to the backend to point to some transformer to apply >>> for final touches - it could be a customization of the implementation >>> which applies the message in our own format to the >>> o.a.lucene.Document) >> >> I don't think I follow you, can you expand on what you think. >> BTW I'm a bit concerned about the "serializablilty" of what would be needed >> to be passed around if you move FieldBridge operations in the backend. > > It's really two different aspects: > 1) let people still use the flexibility of custom bridges, but because > we don't expose the Document directly we'll need to expose something > which is a good replacement for it, especially because of the > serialization issues but also to be able to better "inspect" what > bridges do; I have no specific idea right now but I'm sure that we'll > be able to play some trick at this level. > > 2) no need to define the API now, but it might be useful for special > cases to still customize the "add to Document" aspect; about > serializability of these components, I'd see a good fit to do as you > did with analyzers: give them a name, rebuild the component on the > other side of the wire and refer to them by name. I don't think this > is a priority, but how we should do in case the 1) approach doesn't > result flexible enough for some use case I'm not aware of now. > > >>> - at some point, we'll need to track also which entity properties are >>> being "read" by a custom ClassBridge/DynamicBoost, to better check for >>> index dirtyness. Might be done by proxying the entity, or just having >>> the implementation declare by which properties it's affected: in this >>> case, an API change is needed but this can possibly be postponed. >> >> proxying does not solve all use cases. If a suer has a transient getter that >> reads data from two other getters, you don't get that info via proxying. > > right; well explicit user declarations then, at least optionally. > > updating the wiki now. > > Sanne > >>> >>> this is just out the top of my head, I'm sure I forgot to break some >>> interface ;) >>> I'll give you some time to think about it, then I'll insert the >>> proposals which survived in the wiki & JIRA. >>> (needles to say, no objections on your proposals) >>> >>> Cheers, >>> Sanne >>> >>> >>> 2011/4/20 Emmanuel Bernard <emman...@hibernate.org>: >>>> Hi, >>>> >>>> We have had in our road map an Hibernate Search 3.5 before Hibernate 4. >>>> Hibernate 4 is the release where the following should happen: >>>> - split packages into API, SPI and private packages >>>> - use JBoss Logging >>>> - be compliant with Core 4 >>>> - break whatever contract we need to break to open up the future >>>> - split dependency between the core of Hibernate Search and Hibernate Core >>>> >>>> Do you see more task for 4? >>>> >>>> Since Hibernate Core 4 seems to be doing alright and that the time >>>> pressure will be strong to get Hibernate Search aligned, I propose to skip >>>> 3.5 entirely and focus on 4. We did not that that many new features >>>> planned anyways for 3.5, it was more a consolidation release. >>>> >>>> Even with skipping 3.5, the 4 release will be a lot of work. We should >>>> start early. Any objection or comment? >>>> >>>> Changing contracts >>>> We have had a few contracts that we wanted to change to make way for >>>> future improvements: >>>> - should a bridge know about the field it changes (make the optimization >>>> more efficient) >>>> - rework the backend to let IndexReader and IndexWriter communicate >>>> - rework the backend to support instantiated IndexReaders >>>> >>>> Can you help collect the list of changes you would like to see happening? >>>> >>>> I would like to get this work started asap, this is really the unknown >>>> quantity and we tend to be slow to converge on the things >>>> >>>> Split packages in API/SPI/private packages >>>> Hibernate 4 is the ideal time to properly split stuff into API, SPI, >>>> private. Moving classes to private packages is the least impacting move >>>> for users as these should not be used. The API / SPI split is sometimes >>>> difficult to do so if you have a doubt in an area, ask on the ML or on IRC >>>> and we can discuss it together. If you need an example, check out the >>>> query engine. It is relatively clean now. >>>> >>>> We might have to break a few user APIs which is fine but I don't expect >>>> too many will be necessary: >>>> - make sure to discuss it when you plan to do one >>>> - list them in the migration guide >>>> >>>> I'd say that the package splitting should be done when you have a change >>>> and when you work in a specific area. It's more a background task. >>>> >>>> Be compliant with Core 4 >>>> We can do this one a bit later in the cycle to give time for core to >>>> mature. >>>> >>>> Split dependency between Hibernate Search and Hibernate Core >>>> I think in practice we are not too far. This work should be done in >>>> parallel to the package splitting. If you look at the query engine, we do >>>> have specific hibernate packages. We also have a HibernateHelper class of >>>> all low level Hibernate contracts like unproxying, initializing etc. We >>>> should use that class everywhere instead of relying on the direct >>>> Hibernate Core contracts. That will help up to move this class as an >>>> implementable contract. >>>> The next step potentially is to actually move Hibernate Core specific code >>>> into a separate package. >>>> >>>> I don't have much opinion on this but we should definitively discuss it. >>>> >>>> Use JBoss Logging >>>> I tend to think we should do this migration late in the game. WDYT? >>>> >>>> New features >>>> Do you want any new feature per se? I think this would be a great time to >>>> get the community involved to back new features and fix bugs while we do >>>> the grunt work for 4. So if you know some shy people motivated or if you >>>> are one of them, stand up :) >>>> >>>> Note: I have create a vague copy of this email in >>>> http://community.jboss.org/wiki/PlansforHibernateSearch4 >>>> We can discuss via email but be sure to add the feedback or list of todos >>>> in the wiki as well for posterity. >>>> _______________________________________________ >>>> hibernate-dev mailing list >>>> hibernate-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev >>>> >> >> > _______________________________________________ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev