Re: [hibernate-dev] Hibernate Search 3.5 or 4

Sanne Grinovero Tue, 26 Apr 2011 12:18:37 -0700

Updated the wiki: http://community.jboss.org/docs/DOC-16743


I'll highlight here some previously unmentioned details:
 - Service instances (being discussed)
   * for the public extensions points sometimes it's useful to pass
instances instead of mere classnames to a SearchConfiguration
          o Infinispan Query would love it to pass a self-built
Directory factory so that Search could use that to store indexes on
the same Infinispan instance (instead of starting a second cluster).

- about the less-static mapping:
 I've thought that the tricky part is going to be matching the proper
analyzer in the backend - mentioned that.

- Added a "(being discussed)" flag to what I wasn't still sure was
approved by Emmanuel and Hardy as I had no positive feedback yet.
 (I might have forgot others to cheat you..)

Cheers,
Sanne


2011/4/26 Sanne Grinovero <sa...@hibernate.org>:
> some late answers:
>
> 2011/4/21 Emmanuel Bernard <emman...@hibernate.org>:
>> OK, if we want to do all of this we will hate to start very quickly. In 
>> fairness, I'm not sure we can even do all of this so let's make sure these 
>> are prioritized accordingly.
>> I could not find the expected deadlines for AS 7 / Cre 4 but we are probably 
>> talking about June here: ie very soon.
>>
>> Some more comments inline.
>>
>> On 20 avr. 2011, at 10:21, Sanne Grinovero wrote:
>>
>>> Hi,
>>> About changing contracts, we don't get this chance very often so we
>>> should make sure we don't miss any.
>>> I have some favourites I'd like to discuss:
>>>
>>> - work list sent to backend
>>> -- As you know Lucene dropped all guarantees about serializability,
>>> supporting stuff like JMS requires a format change; especially the
>>> NumericField is not working right now as it was never serializable
>>> (HSEARCH-681)
>>
>> +1
>>
>>> -- Lucene is being more flexible about updates, I don't think we
>>> should keep remapping an "update" operation as a delete+add operation,
>>> but transmit the "update operation" and let the backend figure out
>>> what's best.
>>
>> I guess we could do that. we need to make sure collections "updates" play 
>> well in that mix.
>
> the urgent bit of the proposal is to add an "update" operation as a
> supported verb. there's no need to convert the collections updates
> from using a "delete+add" soon, I just mean to make it possible to
> later improve on this so that the contract allows it.
>
>>> - DirectoryProvider
>>>  -- make a "DirectoryManager" instead, which is able to provide
>>> factories for both IndexReader an IndexWriters
>>>  -- add utility methods like "getName()", wish I had that in some
>>> cases to provide better error messages. This leads me to think that
>>> instead of trying to foresee all needed methods, the extension point
>>> should not be the DirectoryManager interface directly, but have people
>>> plug in different aspects.
>>
>> That might be better also since it reduces the scope, it's easier to design 
>> the contract.
>>
>>> -- this is needed to support both Instantiated indexes and to make
>>> good use of all new so called "Near-Real-Time" Lucene improvements.
>>>
>>> - ReaderProvider
>>> -- (assuming should a thing would still exist): I think it would be
>>> very nice if the responsibility of such a provider would be to provide
>>> the IndexReader for a single index. currently it has to provide a
>>> "multiReader" on each different index, making some implementations
>>> very tricky (seems I got it right in SharingBufferReaderProvider, but
>>> I recently had some other interesting ideas which revelaed quite
>>> dounting after a draft: take responsibility of the FieldCache expiry
>>> directly, to be able to plug different cache implementations, we
>>> control the lifecycle and we can be much smarter).
>>
>> ok, we might be able to do that in a 4.1 if need be.
>
> right, no need to make the new FieldCache integration, but we'd need
> to change the ReaderProvider API to work on a single index.
>
>>
>>>
>>> - backends and workers
>>>  -- I'd like to make it possible to configure different backends per
>>> index. currently a backend is global, while in some cases (extreme) it
>>> would have been hand to configure even single shards to different
>>> backends. So really a backend should be something coupled to the
>>> "DirectoryManager" mentioned before. Question is, at what level is
>>> sharding going to work, it could work as a multiplexing
>>> DirectoryManager.
>>
>> Can you remind us the use case behind heterogeneous backends. There was one 
>> but I forgot.
>
> it's mostly about performance details, the possibility to have
> different entities configured with different requirements: so for
> example one entity might have large indexes and use the rsync copy
> algorithm via the master/slave index providers configured to synch the
> index once per hour or day using async JMS as backend, while another
> entity requires transactional synchronization over the cluster and so
> might need an in sync JMS with the Infinispan directoryprovider.
> Currently people having such a requirement need to configure
> everything as synch.
> There was also a case in which people wanted to use a sharding
> strategy on top of this, to have some shards in high priority for the
> same entity; one corner use case even wanted to have a shard policy
> including a blackhole backend as one of the shards.
>
>>
>>>
>>> -- defaults to change:
>>> - remove the notions of transactional / batch IndexWriter setting,
>>> was deprecated since long enough.
>>
>> ok easy
>>
>>> - make the FullTextEventLister final (people still extent and replace
>>> it to better control when an entity is to be indexed, but I hope we
>>> can solve that as well)
>>
>> Well it will be in a private package anyways
>>
>>> - default to NumericField for numeric properties
>>> - set exclusive_index_use=true by default, benefits are far too high
>>> and some optimizations I was thinking of are impossible if this is
>>> disabled.
>>
>> I'm not sure I agree with that. It seems that such a default would bite a 
>> non careful user too easily.
>
> how bite? it's not going to disable the index locking. And the
> Near-Real-Time features of latest Lucene require the IndexWriter to be
> always open, and this feature is so great for the way Hibernate Search
> uses Lucene, it's sad that we don't support it yet.
>
>>
>>>
>>> -- bridges
>>> - It happened many times that we couldn't do X or optimize Y as "user
>>> bridge might read/write any field"; I think we should stop exposing
>>> the o.a.lucene.Document - especially since we change the format of
>>> messages to the backend - and make sure to expose something as good
>>> and as flexible. Need some thinking on this: we can't expose Document
>>> but we want to make sure people won't ever miss advanced features for
>>> which such a bridge was a nice "advanced api". Or we split the
>>> concepts, having a less-powerful API and a more advanced one, which
>>> could be named, and operate on the Document itself but inside the
>>> backend rather than in the DocumentBuilder (so the name could be used
>>> in the message to the backend to point to some transformer to apply
>>> for final touches - it could be a customization of the implementation
>>> which applies the message in our own format to the
>>> o.a.lucene.Document)
>>
>> I don't think I follow you, can you expand on what you think.
>> BTW I'm a bit concerned about the "serializablilty" of what would be needed 
>> to be passed around if you move FieldBridge operations in the backend.
>
> It's really two different aspects:
> 1) let people still use the flexibility of custom bridges, but because
> we don't expose the Document directly we'll need to expose something
> which is a good replacement for it, especially because of the
> serialization issues but also to be able to better "inspect" what
> bridges do; I have no specific idea right now but I'm sure that we'll
> be able to play some trick at this level.
>
> 2) no need to define the API now, but it might be useful for special
> cases to still customize the "add to Document" aspect; about
> serializability of these components, I'd see a good fit to do as you
> did with analyzers: give them a name, rebuild the component on the
> other side of the wire and refer to them by name. I don't think this
> is a priority, but how we should do in case the 1) approach doesn't
> result flexible enough for some use case I'm not aware of now.
>
>
>>> - at some point, we'll need to track also which entity properties are
>>> being "read" by a custom ClassBridge/DynamicBoost, to better check for
>>> index dirtyness. Might be done by proxying the entity, or just having
>>> the implementation declare by which properties it's affected: in this
>>> case, an API change is needed but this can possibly be postponed.
>>
>> proxying does not solve all use cases. If a suer has a transient getter that 
>> reads data from two other getters, you don't get that info via proxying.
>
> right; well explicit user declarations then, at least optionally.
>
> updating the wiki now.
>
> Sanne
>
>>>
>>> this is just out the top of my head, I'm sure I forgot to break some
>>> interface ;)
>>> I'll give you some time to think about it, then I'll insert the
>>> proposals which survived in the wiki & JIRA.
>>> (needles to say, no objections on your proposals)
>>>
>>> Cheers,
>>> Sanne
>>>
>>>
>>> 2011/4/20 Emmanuel Bernard <emman...@hibernate.org>:
>>>> Hi,
>>>>
>>>> We have had in our road map an Hibernate Search 3.5 before Hibernate 4. 
>>>> Hibernate 4 is the release where the following should happen:
>>>>  - split packages into API, SPI and private packages
>>>>  - use JBoss Logging
>>>>  - be compliant with Core 4
>>>>  - break whatever contract we need to break to open up the future
>>>>  - split dependency between the core of Hibernate Search and Hibernate Core
>>>>
>>>> Do you see more task for 4?
>>>>
>>>> Since Hibernate Core 4 seems to be doing alright and that the time 
>>>> pressure will be strong to get Hibernate Search aligned, I propose to skip 
>>>> 3.5 entirely and focus on 4. We did not that that many new features 
>>>> planned anyways for 3.5, it was more a consolidation release.
>>>>
>>>> Even with skipping 3.5, the 4 release will be a lot of work. We should 
>>>> start early. Any objection or comment?
>>>>
>>>> Changing contracts
>>>> We have had a few contracts that we wanted to change to make way for 
>>>> future improvements:
>>>>  - should a bridge know about the field it changes (make the optimization 
>>>> more efficient)
>>>>  - rework the backend to let IndexReader and IndexWriter communicate
>>>>  - rework the backend to support instantiated IndexReaders
>>>>
>>>> Can you help collect the list of changes you would like to see happening?
>>>>
>>>> I would like to get this work started asap, this is really the unknown 
>>>> quantity and we tend to be slow to converge on the things
>>>>
>>>> Split packages in API/SPI/private packages
>>>> Hibernate 4 is the ideal time to properly split stuff into API, SPI, 
>>>> private. Moving classes to private packages is the least impacting move 
>>>> for users as these should not be used. The API / SPI split is sometimes 
>>>> difficult to do so if you have a doubt in an area, ask on the ML or on IRC 
>>>> and we can discuss it together. If you need an example, check out the 
>>>> query engine. It is relatively clean now.
>>>>
>>>> We might have to break a few user APIs which is fine but I don't expect 
>>>> too many will be necessary:
>>>>  - make sure to discuss it when you plan to do one
>>>>  - list them in the migration guide
>>>>
>>>> I'd say that the package splitting should be done when you have a change 
>>>> and when you work in a specific area. It's more a background task.
>>>>
>>>> Be compliant with Core 4
>>>> We can do this one a bit later in the cycle to give time for core to 
>>>> mature.
>>>>
>>>> Split dependency between Hibernate Search and Hibernate Core
>>>> I think in practice we are not too far. This work should be done in 
>>>> parallel to the package splitting. If you look at the query engine, we do 
>>>> have specific hibernate packages. We also have a HibernateHelper class of 
>>>> all low level Hibernate contracts like unproxying, initializing etc. We 
>>>> should use that class everywhere instead of relying on the direct 
>>>> Hibernate Core contracts. That will help up to move this class as an 
>>>> implementable contract.
>>>> The next step potentially is to actually move Hibernate Core specific code 
>>>> into a separate package.
>>>>
>>>> I don't have much opinion on this but we should definitively discuss it.
>>>>
>>>> Use JBoss Logging
>>>> I tend to think we should do this migration late in the game. WDYT?
>>>>
>>>> New features
>>>> Do you want any new feature per se? I think this would be a great time to 
>>>> get the community involved to back new features and fix bugs while we do 
>>>> the grunt work for 4. So if you know some shy people motivated or if you 
>>>> are one of them, stand up :)
>>>>
>>>> Note: I have create a vague copy of this email in 
>>>> http://community.jboss.org/wiki/PlansforHibernateSearch4
>>>> We can discuss via email but be sure to add the feedback or list of todos 
>>>> in the wiki as well for posterity.
>>>> _______________________________________________
>>>> hibernate-dev mailing list
>>>> hibernate-dev@lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>>>>
>>
>>
>

_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Re: [hibernate-dev] Hibernate Search 3.5 or 4

Reply via email to