Re: [hibernate-dev] [Search] Dynamic sharding configuration
Sanne, As you say adding yet another interface makes things even more difficult to grok; So I'd vote for adding the method for the deletion use case to SIP directly. I'm not sure whether it has been considered before, but maybe we could unify the methods and work with a parameter object as a middle ground: public interface ShardIdentifierProvider { void initialize(Properties properties, BuildContext buildContext); Set getShardIdentifiersForEntity(EntityShardIdentifierRetrievalContext retrievalContext); Set getShardIdentifiersForQuery(QueryShardIdentifierRetrievalContext retrievalContext); Set getAllShardIdentifiers(); } EntityShardIdentifierRetrievalContext would provide all parameters usable for shard determination, clearly stating that "document" is not available in cases such as deleting. Such a parameter object would allow for adding more options in the future in a compatible manner, and also the method names read quite nicely and symmetrically (I share Hardy's concerns about the asymmetry of getShardIdentifier() vs. getShardIdentifiersForDeletion()). The disadvantage of this scheme is that a set needs to be returned also for the case of returning a single identifier during insert/update, which might render the approach not feasible. OTOH I'm wondering why a set needs to be returned for the delete case, your example also returns exactly one identifier? --Gunnar 2013/10/8 Sanne Grinovero > I've included an example which represents a good reason to provide the > controversial method. > Technically the test is crafted as a static sharding approach but is > using the new API; you can easily figure the same case for a dynamic > sharding case; also considering we're deprecating the older static > sharding API, this should also be able to replace whatever was > possible before. > > Could you have a look, reject one and review/merge the other: > - https://github.com/hibernate/hibernate-search/pull/501 > - https://github.com/hibernate/hibernate-search/pull/502 > > I personally have a mild preference for "proposal A", as I think the > additional interface introduced in "proposal B" doesn't simplify but > adds an additional mark on the list of things the user needs to learn > about, but I'm happy with both solutions as long as we move on with > one. > > We'll update documentation in a second PR, depending which way we go :-) > > Cheers, > Sanne > ___ > hibernate-dev mailing list > hibernate-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hibernate-dev > ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] [Search] Dynamic sharding configuration
On 8 Jan 2013, at 10:08 AM, Gunnar Morling wrote: > I'm not sure whether it has been considered before, but maybe we could unify > the methods and work with a parameter object as a middle ground: It has been. I suggested before to combine the methods. I think it is a good approach, but Sanne thought it is bad that he user has to deal with null values. > public interface ShardIdentifierProvider { > > void initialize(Properties properties, BuildContext buildContext); > Set > getShardIdentifiersForEntity(EntityShardIdentifierRetrievalContext > retrievalContext); > Set > getShardIdentifiersForQuery(QueryShardIdentifierRetrievalContext > retrievalContext); > Set getAllShardIdentifiers(); > } It is a reasonable approach. Better then what we have. I am drawn between this one and the additional interface. This approach has the advantage that as you say we can modify the internas of the context instances. > EntityShardIdentifierRetrievalContext would provide all parameters usable for > shard determination, clearly stating that "document" is not available in > cases such as deleting. > Such a parameter object would allow for adding more options in the future in > a compatible manner, and also the method names read quite nicely and > symmetrically > (I share Hardy's concerns about the asymmetry of getShardIdentifier() vs. > getShardIdentifiersForDeletion()). All correct imo. > The disadvantage of this scheme is that a set needs to be returned also for > the case of returning a single identifier during insert/update, which might > render the approach not feasible. > OTOH I'm wondering why a set needs to be returned for the delete case, your > example also returns exactly one identifier? See the comments I made on the pull requests. I also don't think the deletion case should return a set. Either one knows the shard it and returns it or one does not know and you have to return null. In the latter case it is up to us to apply the default strategy. IMO it is wrong to say to the user "if you don't know just return all shard ids". It takes away the option from us to distinguish between these two cases. --Hardy ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] [Search] Dynamic sharding configuration
We'll need to time cap this discussion as we're way too late, of course this will need to be tagged @experimental. Having said that, let's try to find the best proposal possible by lunch time, as one of the approaches needs to be merged: it's very clear that there is big practical value for the user to narrow down deletions. The parameter object approach got my interest but I'll need an example, including the user code implementing the methods, as I suspect a terrible result. I'm pretty sure I would implement this API with small and short methods. As an implementor of such an interface, the first thing I would do is to figure if I'm in the delete case or not - since it's special - and branch off between two methods, after writing the condition check. So I already have 3 methods, and a condition which I could copy paste from the book. End result, I still have to implement the two methods I'm proposing (in addition!), but first I need to understand that "some parameters might be missing" at runtime. That's an absolutely bad idea: using null as a canary token in a method is awful, especially if it's our code calling into user code. The method has to be a clear contract even without reading the javadoc, providing such an explanation in English prose is by far not as clear; defining a contract of two different contexts - even maybe just overloading the same method - is a much clearer set of instructions to the implementor. On 8 October 2013 10:03, Hardy Ferentschik wrote: > > See the comments I made on the pull requests. I also don't think the deletion > case should return a set. Either one knows the shard it and returns it or one > does not know and you have > to return null. In the latter case it is up to us to apply the default > strategy. IMO it is wrong to say to the user "if you don't know just return > all shard ids". It takes away the option from us > to distinguish between these two cases. > In the mail I've sent yesterday I've described an interesting use case to return an empty set for nightly index rebuilds. That should highlight that depending on the scenario there is a full range of options, from the empty set to the universe set. We're clearly talking about sets. BTW It was your proposal initially to use Set, that was a very good one. I don't see the need for us to "distinguish between these two cases", for one because there are more than two cases, but especially as we have no use of making this distinction. Sanne ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] [Search] Dynamic sharding configuration
On 8 Jan 2013, at 11:56 AM, Sanne Grinovero wrote: > Having said that, let's try to find the best proposal possible by > lunch time, as one of the approaches needs to be merged: I think we very well could go w/ the current state right now and evolve in future versions. I would like to explore Gunnar's idea for example. And of course the whole idea of determining the shard id when the document is build. Given that the old approach is just deprecated nothing is lost for the user yet. We would just be more conservative on what we offer right now. > The parameter object approach got my interest but I'll need an > example, including the user code implementing the methods, as I > suspect a terrible result. What would be terrible about it? In fact if we do the shard id determination earlier you might in some cases even have an entity in the deletion case. > I'm pretty sure I would implement this API with small and short > methods. As an implementor of such an interface, the first thing I > would do is to figure if I'm in the delete case or not - since it's > special - and branch off between two methods, after writing the > condition check. So I already have 3 methods, and a condition which I > could copy paste from the book. > End result, I still have to implement the two methods I'm proposing > (in addition!), but first I need to understand that "some parameters > might be missing" at runtime. That's an absolutely bad idea: using > null as a canary token in a method is awful, especially if it's our > code calling into user code. We just have to disagree on this one. I find the API as proposed confusing. Simple string returns via sets and confusing names. I would rather have a context object with a cleat semantic. > The method has to be a clear contract even without reading the > javadoc, providing such an explanation in English prose is by far not > as clear; What is the difference to what you are proposing? String getShardIdentifier(Class entityType, Serializable id, String idAsString, Document document); Set getShardIdentifier(Class entityType, Serializable id, String idAsString); So you are saying one can determine the use of each of the method just by its contract? > defining a contract of two different contexts - even maybe > just overloading the same method - is a much clearer set of > instructions to the implementor. Not always. --Hardy ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] [Search] Dynamic sharding configuration
2013/10/8 Sanne Grinovero > We'll need to time cap this discussion as we're way too late, of > course this will need to be tagged @experimental. > Having said that, let's try to find the best proposal possible by > lunch time, as one of the approaches needs to be merged: it's very > clear that there is big practical value for the user to narrow down > deletions. > > The parameter object approach got my interest but I'll need an > example, including the user code implementing the methods, as I > suspect a terrible result. > > I'm pretty sure I would implement this API with small and short > methods. As an implementor of such an interface, the first thing I > would do is to figure if I'm in the delete case or not - since it's > special - and branch off between two methods, after writing the > condition check. So I already have 3 methods, and a condition which I > could copy paste from the book. > End result, I still have to implement the two methods I'm proposing > (in addition!), but first I need to understand that "some parameters > might be missing" at runtime. That's an absolutely bad idea: using > null as a canary token in a method is awful, especially if it's our > code calling into user code. > > The method has to be a clear contract even without reading the > javadoc, providing such an explanation in English prose is by far not > as clear; defining a contract of two different contexts - even maybe > just overloading the same method - is a much clearer set of > instructions to the implementor. > Yes, the implementation would likely look as you describe. Personally I think users would be able to deal with an optional property of a parameter object and act accordingly; I can see your concern, though. When setting out with getShardIdentifier()/getShardIdentifierForDeletion(), there might be a proliferation of methods when supporting new types of input such as the actual entity; we'd then have to add getShardIdentifierByEntity() and getShardIdentifierByEntityForDeletion() and so on. I guess both approaches would work, in the end its not that a huge difference. On 8 October 2013 10:03, Hardy Ferentschik wrote: > > > > See the comments I made on the pull requests. I also don't think the > deletion case should return a set. Either one knows the shard it and > returns it or one does not know and you have > > to return null. In the latter case it is up to us to apply the default > strategy. IMO it is wrong to say to the user "if you don't know just return > all shard ids". It takes away the option from us > > to distinguish between these two cases. > > > > In the mail I've sent yesterday I've described an interesting use case > to return an empty set for nightly index rebuilds. That should > highlight that depending on the scenario there is a full range of > options, from the empty set to the universe set. We're clearly talking > about sets. > BTW It was your proposal initially to use Set, that was a very good > one. I don't see the need for us to "distinguish between these two > cases", for one because there are more than two cases, but especially > as we have no use of making this distinction. > Here I'm still missing a piece I guess. So far we spoke about returning null, the *empty* set or a set with *all* ids. Is there an example where one would return *some* ids (the logging example returns exactly one id)? In other words, I can't imagine a case where one would conclude *some* shard ids from a given entity which IMO would really justify using a set. If it is only about returning the info "XYZ is the exact shard id" vs. "can't determine a shard id, so consider all", a defined constant may be returned in the latter case, signaling that getAllShardIds() is to be called after that. Regarding your example of returning the empty set to skip index updates on deletions, how would that work in a non-sharded environment? Is using the shard id provider really the right approach for this problem? --Gunnar Sanne > ___ > hibernate-dev mailing list > hibernate-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/hibernate-dev > ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] [Search] Dynamic sharding configuration
Guys let's put this into perspective. These arguments I'm hearing against adding a method in a power-user oriented SPI are way outbalancing the harm they do to the project in terms of release delays and our very own time, there are definitely more interesting issues to dedicate our time on. I appreciate the tech discussions, but ultimately here we're talking about an experimental interface which most users won't care about. Some other users will have very specific high end requirements, and those are our target: I don't appreciate how we spend more than 30 minutes arguing how these smart guys might get confused by a method name. We're not changing the Session contract or anything big like that, we're providing a damn useful feature but really the method name or signature is not so relevant, but it's important that we address the right problem: - sane (no null parameters) - fulfill the requirements of flexibilty that we expect from a user extension point (be able to return a Set) - make sure it's not a performance bottleneck (implementable without too many object allocations) Given this, I'd prefer you to merge my PR from branch HSEARCH-1429 as it fullfills all the requirements. (that's pull https://github.com/hibernate/hibernate-search/pull/502 ) and move on, unless you have some really good argument against it, putting the time & features into perspective. Alternatively for the sake of moving forward, I'll craft a pull which just adds the @Experimental and some docs warnings, but I think we're failing to deliver a good feature which is ready to be delivered today -> very sad. Sanne ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] [Search] Dynamic sharding configuration
On 8 Jan 2013, at 1:38 PM, Sanne Grinovero wrote: > Guys let's put this into perspective. > These arguments I'm hearing against adding a method in a power-user > oriented SPI are way outbalancing the harm they do to the project in > terms of release delays and our very own time, there are definitely > more interesting issues to dedicate our time on. The discussion is about more than adding a method a single method. We have a general disagreement on how to evolve an API. > I appreciate the tech discussions, but ultimately here we're talking > about an experimental interface which most users won't care about. I dislike the argument that just because I mark an interface as experimental I can add whatever I want to it, abandoning other sounds design principles. > - sane (no null parameters) I'd rather deal with a potential null parameter in a context object in exchange in for simple and easy to grasp interface. > - fulfill the requirements of flexibilty that we expect from a user > extension point (be able to return a Set) IMO returning a set is pure guess work. Either you know your shard it or not. In which case would you return multiple? > Alternatively for the sake of moving forward, I'll craft a pull which > just adds the @Experimental and some docs warnings, but I think we're > failing to deliver a good feature which is ready to be delivered today > -> very sad. Still my preferred choice for now in order to get the release out of the door. --Hardy ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
Re: [hibernate-dev] [Search] Dynamic sharding configuration
I don't have much stake in the specialized method vs context object debate as indeed the interface is very specialized and prone to changes. But as Sanne mentioned, there are memory pressure consequences if this call is in the hot path. It is correct that the current use of ForDeletion requires to use a non remote non async backend at the moment. That's something I discussed with Sanne back when I implemented it. It's not hard to imagine how we could transport such information in a later version but that would require additional contracts. The use case I designed dynamic sharding for is to: - create one index per user (think login) - query only by a specific index - apply mutation and deletion on a single index - support 100s users (ie shards) per VM instance I also had Bloom filters in mind when I designed the original sharding strategy. For these scenarios, a smart ForDeletion is necessary as you don't want to open / query hundreds of indexes unnecessarily. That Hardy thinks the use case is wrong is beyond me but if that's the general feeling, that's fine, I'll fork Hibernate Search and make it useful for me. For the record, I handed over a working solution 6 months short of 4 days... I am sure it was not perfect, but certainly not 6 months away from it. I know you guys wanted support to inject a Session to resolve shards which has put some significant constraints on the life cycle. But still. Conclusion Draw your own, I'm out of it. Emmanuel On Mon 2013-10-07 16:03, Sanne Grinovero wrote: > On 2 October 2013 14:34, Emmanuel Bernard wrote: > > On Tue 2013-09-24 14:30, Sanne Grinovero wrote: > >> On 24 September 2013 14:12, Hardy Ferentschik wrote: > >> > 2) remove 'String[] getShardIdentifiers(Class entity, Serializable > >> > id, String idInString)' from ShardIdentifierProvider > >> > >> +1 we're automatically assuming a deletion needs to be routed to all > >> identifiers. > > > > Bad idea as I explained in my previous email. Plus we could already make > > use of that if we reuse Hibernate ORM's tenantid value. > > I've tried hard to find an agreement on this, but it seems we're > wasting time without making progress. > I'm not happy in ignoring a strong recommendation from any of you, > very hard choice :-( > > Hardy are you going to reconnect later? Could you reply to this email > of Emmanuel? > > I'm inclined to add the method back, so that it's the users choice to > pick his battle. As mentioned below, I don't think we should take > options away from them. > Of course our template implementation could provide a sensible default > method, so all users looking for simplicity don't need to bother too > much about the extra method. > > @Emmanuel the last conversation we had on the subject is below: > > --Sanne > > [15:05] hardy: on the dynamic shard id concerns rised > by emmanuel, I thought you where going to propose a pair of method > names that would suite you? > [15:05] I still think we need to restore the missing method. > [15:05] I don't think so > [15:06] emmanuel had quite a strong opinion about it, > don't think it's fair to ignore that. > [15:06] I thought more about it and I think the use case is > not even imlementable > [15:06] you have a point there. > [15:06] well, I think he is wrong > [15:06] take his use case > [15:06] he wants to use some sort of customer id or ORM shard > identifier > [15:07] sounds great, but there is no such context to get it from > [15:07] so what can you do in this case? > [15:07] a ThreadLocal > [15:07] and that's exactly the problem > [15:07] if the shards id were determined at the document built > time (as we want to do it ), it might be possible > [15:07] amazing how often "context" is problematic :) > [15:08] he he > [15:08] no. as far as I understood,he was planning to > get a reference to the Strategy, and then invoke setters on it to > "program" the thing. > [15:08] but now the shard identifiers are "generated" when the > changes are getting applied to the index > [15:09] that's happening on a different thread > [15:09] hardy: we're going in circles with this > debate on abstract hypothesis. Emmanuel said he has a use case for it, > and implemented it. that should be good enough for us? > [15:09] no > [15:09] I honestly would like to see the code first > [15:09] :-) let me try a proposal > [15:10] and how does it work in a clustered environment > [15:10] or JMS > [15:10] I seriously doubt we can implement this in a decent way atm > [15:10] WDYT of this plan: we re-introduce the > method, and provide the abstract base class I've made; then the > deletion method has a default implementation. > [15:11] one beauty of the new interface is, is that it is simpler > [15:11] and imo it removes something which was conceptually > not working anyways > [15:11] then in future we can deprecate this method. > [15:11] I don't see a point of re-introducing it unless > someone can actually provide a working example > [15:1
[hibernate-dev] Logging dependency cleanup
I thought I remember someone (Brett? Strong?) going through an cleaning up references to logging libraries we no longer use. But I still see entries in libraries.gradle in master for: slf4j_api: "org.slf4j:slf4j-api:${slf4jVersion}", slf4j_log4j12: "org.slf4j:slf4j-log4j12:${slf4jVersion}", jcl_slf4j: "org.slf4j:jcl-over-slf4j:${slf4jVersion}", jcl_api: 'commons-logging:commons-logging-api:99.0-does-not-exist', jcl: 'commons-logging:commons-logging:99.0-does-not-exist', Did I just dream this? ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev
[hibernate-dev] 4.3.0.Beta6 release date
I tentatively scheduled 4.3.0.Beta6 (which is looking likely to be the first CR) for November 6th which is the 4 week timebox. Trouble is, that this is the day I am scheduled to fly back from Rome. So all in all, this release is likely to happen early or late. ___ hibernate-dev mailing list hibernate-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/hibernate-dev