Re: [hibernate-dev] [Search] Dynamic sharding configuration

2013-10-08 Thread Gunnar Morling
Sanne,

As you say adding yet another interface makes things even more difficult to
grok; So I'd vote for adding the method for the deletion use case to SIP
directly.

I'm not sure whether it has been considered before, but maybe we could
unify the methods and work with a parameter object as a middle ground:

public interface ShardIdentifierProvider {

void initialize(Properties properties, BuildContext buildContext);
Set
getShardIdentifiersForEntity(EntityShardIdentifierRetrievalContext
retrievalContext);
Set
getShardIdentifiersForQuery(QueryShardIdentifierRetrievalContext
retrievalContext);
Set getAllShardIdentifiers();
}

EntityShardIdentifierRetrievalContext would provide all parameters usable
for shard determination, clearly stating that "document" is not available
in cases such as deleting. Such a parameter object would allow for adding
more options in the future in a compatible manner, and also the method
names read quite nicely and symmetrically (I share Hardy's concerns about
the asymmetry of getShardIdentifier() vs. getShardIdentifiersForDeletion()).

The disadvantage of this scheme is that a set needs to be returned also for
the case of returning a single identifier during insert/update, which might
render the approach not feasible. OTOH I'm wondering why a set needs to be
returned for the delete case, your example also returns exactly one
identifier?

--Gunnar




2013/10/8 Sanne Grinovero 

> I've included an example which represents a good reason to provide the
> controversial method.
> Technically the test is crafted as a static sharding approach but is
> using the new API; you can easily figure the same case for a dynamic
> sharding case; also considering we're deprecating the older static
> sharding API, this should also be able to replace whatever was
> possible before.
>
> Could you have a look, reject one and review/merge the other:
>  - https://github.com/hibernate/hibernate-search/pull/501
>  - https://github.com/hibernate/hibernate-search/pull/502
>
> I personally have a mild preference for "proposal A", as I think the
> additional interface introduced in "proposal B" doesn't simplify but
> adds an additional mark on the list of things the user needs to learn
> about, but I'm happy with both solutions as long as we move on with
> one.
>
> We'll update documentation in a second PR, depending which way we go :-)
>
> Cheers,
> Sanne
> ___
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] [Search] Dynamic sharding configuration

2013-10-08 Thread Hardy Ferentschik

On 8 Jan 2013, at 10:08 AM, Gunnar Morling  wrote:

> I'm not sure whether it has been considered before, but maybe we could unify 
> the methods and work with a parameter object as a middle ground:

It has been. I suggested before to combine the methods. I think it is a good 
approach, but Sanne thought it is bad that he user has to deal with 
null values. 

> public interface ShardIdentifierProvider {
> 
> void initialize(Properties properties, BuildContext buildContext);
> Set 
> getShardIdentifiersForEntity(EntityShardIdentifierRetrievalContext 
> retrievalContext);
> Set 
> getShardIdentifiersForQuery(QueryShardIdentifierRetrievalContext 
> retrievalContext);
> Set getAllShardIdentifiers();
> }

It is a reasonable approach. Better then what we have. I am drawn between this 
one and the additional interface. This approach has the advantage
that as you say we can modify the internas of the context instances. 

> EntityShardIdentifierRetrievalContext would provide all parameters usable for 
> shard determination, clearly stating that "document" is not available in 
> cases such as deleting.
> Such a parameter object would allow for adding more options in the future in 
> a compatible manner, and also the method names read quite nicely and 
> symmetrically
> (I share Hardy's concerns about the asymmetry of getShardIdentifier() vs. 
> getShardIdentifiersForDeletion()).

All correct imo.

> The disadvantage of this scheme is that a set needs to be returned also for 
> the case of returning a single identifier during insert/update, which might 
> render the approach not feasible.
> OTOH I'm wondering why a set needs to be returned for the delete case, your 
> example also returns exactly one identifier?

See the comments I made on the pull requests. I also don't think the deletion 
case should return a set. Either one knows the shard it and returns it or one 
does not know and you have
to return null. In the latter case it is up to us to apply the default 
strategy. IMO it is wrong to say to the user "if you don't know just return all 
shard ids". It takes away the option from us
to distinguish between these two cases.

--Hardy
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] [Search] Dynamic sharding configuration

2013-10-08 Thread Sanne Grinovero
We'll need to time cap this discussion as we're way too late, of
course this will need to be tagged @experimental.
Having said that, let's try to find the best proposal possible by
lunch time, as one of the approaches needs to be merged: it's very
clear that there is big practical value for the user to narrow down
deletions.

The parameter object approach got my interest but I'll need an
example, including the user code implementing the methods, as I
suspect a terrible result.

I'm pretty sure I would implement this API with small and short
methods. As an implementor of such an interface, the first thing I
would do is to figure if I'm in the delete case or not - since it's
special - and branch off between two methods, after writing the
condition check. So I already have 3 methods, and a condition which I
could copy paste from the book.
End result, I still have to implement the two methods I'm proposing
(in addition!), but first I need to understand that "some parameters
might be missing" at runtime. That's an absolutely bad idea: using
null as a canary token in a method is awful, especially if it's our
code calling into user code.

The method has to be a clear contract even without reading the
javadoc, providing such an explanation in English prose is by far not
as clear; defining a contract of two different contexts - even maybe
just overloading the same method - is a much clearer set of
instructions to the implementor.

On 8 October 2013 10:03, Hardy Ferentschik  wrote:
>
> See the comments I made on the pull requests. I also don't think the deletion 
> case should return a set. Either one knows the shard it and returns it or one 
> does not know and you have
> to return null. In the latter case it is up to us to apply the default 
> strategy. IMO it is wrong to say to the user "if you don't know just return 
> all shard ids". It takes away the option from us
> to distinguish between these two cases.
>

In the mail I've sent yesterday I've described an interesting use case
to return an empty set for nightly index rebuilds. That should
highlight that depending on the scenario there is a full range of
options, from the empty set to the universe set. We're clearly talking
about sets.
BTW It was your proposal initially to use Set, that was a very good
one. I don't see the need for us to "distinguish between these two
cases", for one because there are more than two cases, but especially
as we have no use of making this distinction.

Sanne
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] [Search] Dynamic sharding configuration

2013-10-08 Thread Hardy Ferentschik

On 8 Jan 2013, at 11:56 AM, Sanne Grinovero  wrote:

> Having said that, let's try to find the best proposal possible by
> lunch time, as one of the approaches needs to be merged: 

I think we very well could go w/ the current state right now and evolve in 
future versions.
I would like to explore Gunnar's idea for example. And of course the whole idea 
of 
determining the shard id when the document is build.

Given that the old approach is just deprecated nothing is lost for the user 
yet. 
We would just be more conservative on what we offer right now.

> The parameter object approach got my interest but I'll need an
> example, including the user code implementing the methods, as I
> suspect a terrible result.

What would be terrible about it? In fact if we do the shard id determination 
earlier 
you might in some cases even have an entity in the deletion case.

> I'm pretty sure I would implement this API with small and short
> methods. As an implementor of such an interface, the first thing I
> would do is to figure if I'm in the delete case or not - since it's
> special - and branch off between two methods, after writing the
> condition check. So I already have 3 methods, and a condition which I
> could copy paste from the book.
> End result, I still have to implement the two methods I'm proposing
> (in addition!), but first I need to understand that "some parameters
> might be missing" at runtime. That's an absolutely bad idea: using
> null as a canary token in a method is awful, especially if it's our
> code calling into user code.

We just have to disagree on this one. I find the API as proposed confusing.
Simple string returns via sets and confusing names. I would rather have a 
context object with a cleat semantic.

> The method has to be a clear contract even without reading the
> javadoc, providing such an explanation in English prose is by far not
> as clear;

What is the difference to what you are proposing?

String getShardIdentifier(Class entityType, Serializable id, String 
idAsString, Document document);
Set getShardIdentifier(Class entityType, Serializable id, String 
idAsString);

So you are saying one can determine the use of each of the method just by its 
contract?

> defining a contract of two different contexts - even maybe
> just overloading the same method - is a much clearer set of
> instructions to the implementor.

Not always.

--Hardy
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] [Search] Dynamic sharding configuration

2013-10-08 Thread Gunnar Morling
2013/10/8 Sanne Grinovero 

> We'll need to time cap this discussion as we're way too late, of
> course this will need to be tagged @experimental.
> Having said that, let's try to find the best proposal possible by
> lunch time, as one of the approaches needs to be merged: it's very
> clear that there is big practical value for the user to narrow down
> deletions.
>
> The parameter object approach got my interest but I'll need an
> example, including the user code implementing the methods, as I
> suspect a terrible result.
>
> I'm pretty sure I would implement this API with small and short
> methods. As an implementor of such an interface, the first thing I
> would do is to figure if I'm in the delete case or not - since it's
> special - and branch off between two methods, after writing the
> condition check. So I already have 3 methods, and a condition which I
> could copy paste from the book.
> End result, I still have to implement the two methods I'm proposing
> (in addition!), but first I need to understand that "some parameters
> might be missing" at runtime. That's an absolutely bad idea: using
> null as a canary token in a method is awful, especially if it's our
> code calling into user code.
>
> The method has to be a clear contract even without reading the
> javadoc, providing such an explanation in English prose is by far not
> as clear; defining a contract of two different contexts - even maybe
> just overloading the same method - is a much clearer set of
> instructions to the implementor.
>

Yes, the implementation would likely look as you describe. Personally I
think users would be able to deal with an optional property of a parameter
object and act accordingly; I can see your concern, though.

When setting out with getShardIdentifier()/getShardIdentifierForDeletion(),
there might be a proliferation of methods when supporting new types of
input such as the actual entity; we'd then have to add
getShardIdentifierByEntity() and getShardIdentifierByEntityForDeletion()
and so on.

I guess both approaches would work, in the end its not that a huge
difference.

On 8 October 2013 10:03, Hardy Ferentschik  wrote:
> >
> > See the comments I made on the pull requests. I also don't think the
> deletion case should return a set. Either one knows the shard it and
> returns it or one does not know and you have
> > to return null. In the latter case it is up to us to apply the default
> strategy. IMO it is wrong to say to the user "if you don't know just return
> all shard ids". It takes away the option from us
> > to distinguish between these two cases.
> >
>
> In the mail I've sent yesterday I've described an interesting use case
> to return an empty set for nightly index rebuilds. That should
> highlight that depending on the scenario there is a full range of
> options, from the empty set to the universe set. We're clearly talking
> about sets.
> BTW It was your proposal initially to use Set, that was a very good
> one. I don't see the need for us to "distinguish between these two
> cases", for one because there are more than two cases, but especially
> as we have no use of making this distinction.
>

Here I'm still missing a piece I guess. So far we spoke about returning
null, the *empty* set or a set with *all* ids. Is there an example where
one would return *some* ids (the logging example returns exactly one id)?
In other words, I can't imagine a case where one would conclude *some*
shard ids from a given entity which IMO would really justify using a set.

If it is only about returning the info "XYZ is the exact shard id" vs.
"can't determine a shard id, so consider all", a defined constant may be
returned in the latter case, signaling that getAllShardIds() is to be
called after that.

Regarding your example of returning the empty set to skip index updates on
deletions, how would that work in a non-sharded environment? Is using the
shard id provider really the right approach for this problem?

--Gunnar

Sanne
> ___
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] [Search] Dynamic sharding configuration

2013-10-08 Thread Sanne Grinovero
Guys let's put this into perspective.
These arguments I'm hearing against adding a method in a power-user
oriented SPI are way outbalancing the harm they do to the project in
terms of release delays and our very own time, there are definitely
more interesting issues to dedicate our time on.

I appreciate the tech discussions, but ultimately here we're talking
about an experimental interface which most users won't care about.
Some other users will have very specific high end requirements, and
those are our target: I don't appreciate how we spend more than 30
minutes arguing how these smart guys might get confused by a method
name.
We're not changing the Session contract or anything big like that,
we're providing a damn useful feature but really the method name or
signature is not so relevant, but it's important that we address the
right problem:
 - sane (no null parameters)
 - fulfill the requirements of flexibilty that we expect from a user
extension point (be able to return a Set)
 - make sure it's not a performance bottleneck (implementable without
too many object allocations)

Given this, I'd prefer you to merge my PR from branch HSEARCH-1429 as
it fullfills all the requirements.
(that's pull https://github.com/hibernate/hibernate-search/pull/502 )
and move on, unless you have some really good argument against it,
putting the time & features into perspective.

Alternatively for the sake of moving forward, I'll craft a pull which
just adds the @Experimental and some docs warnings, but I think we're
failing to deliver a good feature which is ready to be delivered today
-> very sad.

Sanne
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] [Search] Dynamic sharding configuration

2013-10-08 Thread Hardy Ferentschik

On 8 Jan 2013, at 1:38 PM, Sanne Grinovero  wrote:

> Guys let's put this into perspective.
> These arguments I'm hearing against adding a method in a power-user
> oriented SPI are way outbalancing the harm they do to the project in
> terms of release delays and our very own time, there are definitely
> more interesting issues to dedicate our time on.

The discussion is about more than adding a method a single method.
We have a general disagreement on how to evolve an API.

> I appreciate the tech discussions, but ultimately here we're talking
> about an experimental interface which most users won't care about.

I dislike the argument that just because I mark an interface as experimental I
can add whatever I want to it, abandoning other sounds design principles. 

> - sane (no null parameters)

I'd rather deal with a potential null parameter in a context object in exchange 
in 
for simple and easy to grasp interface. 

> - fulfill the requirements of flexibilty that we expect from a user
> extension point (be able to return a Set)

IMO returning a set is pure guess work. Either you know your shard it or not. 
In which case would you return multiple?

> Alternatively for the sake of moving forward, I'll craft a pull which
> just adds the @Experimental and some docs warnings, but I think we're
> failing to deliver a good feature which is ready to be delivered today
> -> very sad.

Still my preferred choice for now in order to get the release out of the door.

--Hardy


___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] [Search] Dynamic sharding configuration

2013-10-08 Thread Emmanuel Bernard
I don't have much stake in the specialized method vs context object
debate as indeed the interface is very specialized and prone to changes.
But as Sanne mentioned, there are memory pressure consequences if this
call is in the hot path.

It is correct that the current use of ForDeletion requires to use a non
remote non async backend at the moment. That's something I discussed
with Sanne back when I implemented it.
It's not hard to imagine how we could transport such information in a
later version but that would require additional contracts.

The use case I designed dynamic sharding for is to:

- create one index per user (think login)
- query only by a specific index
- apply mutation and deletion on a single index
- support 100s users (ie shards) per VM instance

I also had Bloom filters in mind when I designed the original sharding
strategy.

For these scenarios, a smart ForDeletion is necessary as you don't want to
open / query hundreds of indexes unnecessarily.

That Hardy thinks the use case is wrong is beyond me but if that's the
general feeling, that's fine, I'll fork Hibernate Search and make it
useful for me.
For the record, I handed over a working solution 6 months short of 4
days... I am sure it was not perfect, but certainly not 6 months
away from it. I know you guys wanted support to inject a Session to
resolve shards which has put some significant constraints on the life
cycle. But still.

Conclusion

Draw your own, I'm out of it.

Emmanuel

On Mon 2013-10-07 16:03, Sanne Grinovero wrote:
> On 2 October 2013 14:34, Emmanuel Bernard  wrote:
> > On Tue 2013-09-24 14:30, Sanne Grinovero wrote:
> >> On 24 September 2013 14:12, Hardy Ferentschik  wrote:
> >> > 2) remove 'String[] getShardIdentifiers(Class entity, Serializable 
> >> > id, String idInString)' from ShardIdentifierProvider
> >>
> >> +1 we're automatically assuming a deletion needs to be routed to all
> >> identifiers.
> >
> > Bad idea as I explained in my previous email. Plus we could already make
> > use of that if we reuse Hibernate ORM's tenantid value.
> 
> I've tried hard to find an agreement on this, but it seems we're
> wasting time without making progress.
> I'm not happy in ignoring a strong recommendation from any of you,
> very hard choice :-(
> 
> Hardy are you going to reconnect later? Could you reply to this email
> of Emmanuel?
> 
> I'm inclined to add the method back, so that it's the users choice to
> pick his battle. As mentioned below, I don't think we should take
> options away from them.
> Of course our template implementation could provide a sensible default
> method, so all users looking for simplicity don't need to bother too
> much about the extra method.
> 
> @Emmanuel the last conversation we had on the subject is below:
> 
> --Sanne
> 
> [15:05]  hardy: on the dynamic shard id concerns rised
> by emmanuel, I thought you where going to propose a pair of method
> names that would suite you?
> [15:05]  I still think we need to restore the missing method.
> [15:05]  I don't think so
> [15:06]  emmanuel had quite a strong opinion about it,
> don't think it's fair to ignore that.
> [15:06]  I thought more about it and I think the use case is
> not even imlementable
> [15:06]  you have a point there.
> [15:06]  well, I think he is wrong
> [15:06]  take his use case
> [15:06]  he wants to use some sort of customer id or ORM shard 
> identifier
> [15:07]  sounds great, but there is no such context to get it from
> [15:07]  so what can you do in this case?
> [15:07]  a ThreadLocal
> [15:07]  and that's exactly the problem
> [15:07]  if the shards id were determined at the document built
> time (as we want to do it ), it might be possible
> [15:07]  amazing how often "context" is problematic :)
> [15:08]  he he
> [15:08]  no. as far as I understood,he was planning to
> get a reference to the Strategy, and then invoke setters on it to
> "program" the thing.
> [15:08]  but now the shard identifiers are "generated" when the
> changes are getting applied to the index
> [15:09]  that's happening on a different thread
> [15:09]  hardy: we're going in circles with this
> debate on abstract hypothesis. Emmanuel said he has a use case for it,
> and implemented it. that should be good enough for us?
> [15:09]  no
> [15:09]  I honestly would like to see the code first
> [15:09]  :-) let me try a proposal
> [15:10]  and how does it work in a clustered environment
> [15:10]  or JMS
> [15:10]  I seriously doubt we can implement this in a decent way atm
> [15:10]  WDYT of this plan: we re-introduce the
> method, and provide the abstract base class I've made; then the
> deletion method has a default implementation.
> [15:11]  one beauty of the new interface is, is that it is simpler
> [15:11]  and imo it removes something which was conceptually
> not working anyways
> [15:11]  then in future we can deprecate this method.
> [15:11]  I don't see a point of re-introducing it unless
> someone can actually provide a working example
> [15:1

[hibernate-dev] Logging dependency cleanup

2013-10-08 Thread Steve Ebersole
I thought I remember someone (Brett?  Strong?) going through an cleaning 
up references to logging libraries we no longer use.  But I still see 
entries in libraries.gradle in master for:

 slf4j_api: "org.slf4j:slf4j-api:${slf4jVersion}",
 slf4j_log4j12: "org.slf4j:slf4j-log4j12:${slf4jVersion}",
 jcl_slf4j: "org.slf4j:jcl-over-slf4j:${slf4jVersion}",
 jcl_api: 
'commons-logging:commons-logging-api:99.0-does-not-exist',
 jcl: 'commons-logging:commons-logging:99.0-does-not-exist',

Did I just dream this?

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


[hibernate-dev] 4.3.0.Beta6 release date

2013-10-08 Thread Steve Ebersole
I tentatively scheduled 4.3.0.Beta6 (which is looking likely to be the 
first CR) for November 6th which is the 4 week timebox.  Trouble is, 
that this is the day I am scheduled to fly back from Rome.

So all in all, this release is likely to happen early or late.
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev