[hibernate-dev] [SEARCH] Translating analyzer definitions from HSearch to Elasticsearch

2016-12-13 Thread Yoann Rodiere
Hello everyone,

I'm currently working on HSEARCH-2219, "Define analyzers via the REST API",
whose purpose is to automatically translate @AnalyzerDefs in Hibernate
Search to settings in Elasticsearch, removing the need for users to
configure analyzers separately in their Elasticsearch instance.

The thing is, the structure of our configuration in Hibernate Search is
different from the one in Elasticsearch. In particular, we can't name
instances of token filters, char filters, etc, while in Elasticsearch one
*has* to name them in order to provide parameters.

See for instance:

@AnalyzerDef(
  name = "myAnalyzer",
  tokenizer = @TokenizerDef(
factory = StandardTokenizerFactory.class,
parameters = @Parameters(@Parameter(name = "maxTokenLength", value =
"900"))
  )
)

compared to the Elasticsearch way:

index :
analysis :
analyzer :
myAnalyzer :
type : custom
tokenizer : myTokenizer1
tokenizer :
myTokenizer1 :
type : standard
max_token_length : 900

The analyzer name is there on both sides, @TokenizerDef.factory would give
me the tokenizer type, and parameters are pretty obvious too. But
"myTokenizer1", the tokenizer name, has absolutely no equivalent in
Hibernate Search.

I could try to generate names automatically, but those would need to be
more or less stable across multiple executions in order for schema
validation to work properly. And there's nothing we could really use as an
identifier in our annotations, at least not reliably.

To fill the gap, I'd like to add a "name" attribute to the TokenizerDef,
CharFilterDef and TokenFilterDef annotations. This attribute would be
optional and the documentation would mention that it's useless for embedded
Lucene.

Another solution would be to have a "magic" @Parameter, named after a
constant (ElasticsearchParameters.TOKENIZER_NAME for instance), and detect
that parameter automatically, but it feels wrong... mainly because
@AnalyzerDef already has its own "name" attribute, so why wouldn't
@TokenizerDef?

And finally, we could bring our annotations closer to the Elasticsearch
way, by providing a way to define tokenizers/char filters/token filters and
a separate way to reference those definitions, but I don't think that's 5.6
material, since we'd likely have to break things or lose consistency.

WDYT?

Yoann Rodière 
Hibernate NoORM Team
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Re: [hibernate-dev] [SEARCH] Translating analyzer definitions from HSearch to Elasticsearch

2016-12-13 Thread Sanne Grinovero
Adding a "name" attribute to the @TokenizerDef annotation seems like a
good idea.

Make it an optional attribute of course, we can throw an exception if
it's missing and ES is being used, while maintaining compatibility
with existing apps using Lucene.
Perhaps you could be slightly forgiving in certain situations - I
guess you could use the fully qualified classname for example when
it's used only once - but your choice to see if that little benefit is
a worthy trade-off to implement.

Rather than documenting that this is useless for Lucene, we might even
take advantage of that (eventually) for some diagnostics messages /
tooling / debugging?
Not suggesting you do that now, just justifying that the "name"
attribute isn't entirely out of scope even for the Lucene embedded
case.

+1 to defer separating the filter chains into named, reusable
components: that can wait.

Thanks,
Sanne


On 13 December 2016 at 08:26, Yoann Rodiere  wrote:
> Hello everyone,
>
> I'm currently working on HSEARCH-2219, "Define analyzers via the REST API",
> whose purpose is to automatically translate @AnalyzerDefs in Hibernate
> Search to settings in Elasticsearch, removing the need for users to
> configure analyzers separately in their Elasticsearch instance.
>
> The thing is, the structure of our configuration in Hibernate Search is
> different from the one in Elasticsearch. In particular, we can't name
> instances of token filters, char filters, etc, while in Elasticsearch one
> *has* to name them in order to provide parameters.
>
> See for instance:
>
> @AnalyzerDef(
>   name = "myAnalyzer",
>   tokenizer = @TokenizerDef(
> factory = StandardTokenizerFactory.class,
> parameters = @Parameters(@Parameter(name = "maxTokenLength", value =
> "900"))
>   )
> )
>
> compared to the Elasticsearch way:
>
> index :
> analysis :
> analyzer :
> myAnalyzer :
> type : custom
> tokenizer : myTokenizer1
> tokenizer :
> myTokenizer1 :
> type : standard
> max_token_length : 900
>
> The analyzer name is there on both sides, @TokenizerDef.factory would give
> me the tokenizer type, and parameters are pretty obvious too. But
> "myTokenizer1", the tokenizer name, has absolutely no equivalent in
> Hibernate Search.
>
> I could try to generate names automatically, but those would need to be
> more or less stable across multiple executions in order for schema
> validation to work properly. And there's nothing we could really use as an
> identifier in our annotations, at least not reliably.
>
> To fill the gap, I'd like to add a "name" attribute to the TokenizerDef,
> CharFilterDef and TokenFilterDef annotations. This attribute would be
> optional and the documentation would mention that it's useless for embedded
> Lucene.
>
> Another solution would be to have a "magic" @Parameter, named after a
> constant (ElasticsearchParameters.TOKENIZER_NAME for instance), and detect
> that parameter automatically, but it feels wrong... mainly because
> @AnalyzerDef already has its own "name" attribute, so why wouldn't
> @TokenizerDef?
>
> And finally, we could bring our annotations closer to the Elasticsearch
> way, by providing a way to define tokenizers/char filters/token filters and
> a separate way to reference those definitions, but I don't think that's 5.6
> material, since we'd likely have to break things or lose consistency.
>
> WDYT?
>
> Yoann Rodière 
> Hibernate NoORM Team
> ___
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

[hibernate-dev] Testing DB lock timeouts

2016-12-13 Thread Radim Vansa
Hi,

since hibernate-infinispan testsuite has been set on by default, 
recently I've set myself to improve the execution time which is several 
minutes due to various sleeps and timeouts.

Many of the tests test concurrency issues, and that often involves 
issuing two writes to single table/row in DB. In H2, this results in 
waiting 10 seconds (as configured default lock timeout), and since the 
tests are executed sequentially, the testsuite takes much longer than it 
should.

Obvious workaround is reducing this timeout to, say, 100 ms, but this 
could lead to a) false positives and b) those 100 ms add up and with 
over thousand of tests (for various configurations), this could be 
minutes anyway.

Q: is there any infrastructure in testsuite to hook into the DB, assert 
that it's waiting in lock and let the thread time out if everything is 
as expected?

Radim

-- 
Radim Vansa 
JBoss Performance Team

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Testing DB lock timeouts

2016-12-13 Thread Steve Ebersole
ConnectionProvider mocking the Connection?

To help there is org.hibernate.testing.jdbc.JdbcMocks


On Tue, Dec 13, 2016 at 11:23 AM Radim Vansa  wrote:

> Hi,
>
> since hibernate-infinispan testsuite has been set on by default,
> recently I've set myself to improve the execution time which is several
> minutes due to various sleeps and timeouts.
>
> Many of the tests test concurrency issues, and that often involves
> issuing two writes to single table/row in DB. In H2, this results in
> waiting 10 seconds (as configured default lock timeout), and since the
> tests are executed sequentially, the testsuite takes much longer than it
> should.
>
> Obvious workaround is reducing this timeout to, say, 100 ms, but this
> could lead to a) false positives and b) those 100 ms add up and with
> over thousand of tests (for various configurations), this could be
> minutes anyway.
>
> Q: is there any infrastructure in testsuite to hook into the DB, assert
> that it's waiting in lock and let the thread time out if everything is
> as expected?
>
> Radim
>
> --
> Radim Vansa 
> JBoss Performance Team
>
> ___
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev