Re: [hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards

Emmanuel Bernard Sun, 03 Aug 2008 10:10:38 -0700


--
Emmanuel Bernard

http://in.relation.to/Bloggers/Emmanuel | http://blog.emmanuelbernard.com| http://twitter.com/emmanuelbernard

Hibernate Search in Action (http://is.gd/Dl1)


On  Aug 3, 2008, at 09:15, Sanne Grinovero wrote:

2008/8/1 Emmanuel Bernard <[EMAIL PROTECTED]>:
On  Aug 1, 2008, at 13:42, Sanne Grinovero wrote:
Hello Emmanuel,

2008/7/31 Emmanuel Bernard <[EMAIL PROTECTED]>:
On  Jul 31, 2008, at 09:22, Sanne Grinovero wrote:
about the API, wouldn't it make more sense to have it look like a
filter?
can you give more details?
I was just thinking about the name"fullTextQuery.setShardHint("Sony");":I wouldn't call it a "hint", but a filter as it could affect theresults;
A "hint" sounds like you are trying to improve the performance in
a way that shouldn't change the result, so:

fullTextQuery.enableFullTextFilter("Sony")
and it could differ from a normal FullTextFilter only by it'sconcrete
implementation.
Just my 2cents, as I think the effect is the same.
Interesting concept and much more transparent. Not sure how easy itis to dothat though. A typical filter is cached per IndexReader. We cannotdo thatfor the "special" filter as opening the index defeats the purpose.Lucene
filters are applied per IndexReader so too late in the game.
You don't need to cache this, as it doesn't really contain the
filtered data, so we can just
avoid that. When opening the readers we could look at enabled filters,
and if there's one
of this type we just affect the selection of indexes to really open
(delegate the sharing impl
to make the right choice); no need to apply a real Lucene filterafterwards.
(it should perform as a cached filter which survives even a index
reopening, nice!)
We could look at the filtertypes by name, and put them in separate
containers at startup to
avoid the type-checking at runtime.

It's worth trying a prototype. We should open a JIRA issue to capturethat.

the feature looks great, but in my case I would need the ability in
the ShardingStrategy to create new
indexes; what do you think about that? I mean the size of the arrays
could need to grow.
Yes that's a feature I thought about but it means we will run intoa lot ofconcurrency issues (the HSearch config is all done at init timetoday). Ifwe do that this needs to be well thought and I am not sure howfeasible it
is.
Yes that's why I think we should move away of identifying the shardswith aindex number, but give them "names" or some other way to identifythem.Nothing stops your default sharding strategies to expect names as"1" and "2",
but other implementations could prefer a different naming scheme,
and it could be more readable in the configuration files to select
different indexing parameters per shard.

I don't see how a different naming scheme helps solving theconcurrency issues.


Basically all my content is "clustered" in some macrocategories, and
usually the search is done after

having selected the category: so it would be perfect to haveactually

different indexes per cat.,
but eventually someone could need to add a new category, the
shardingStrategy would need to write
a new empty index.
I would like also the possibility to move away from array-indexes to
some other identifier for the shards;

I am not sure what you gain from that. In any ways, yourShardingStrategy

can do the conversion from your cat name to the shard index.


in my specific case I would love to use something like the PK of the
category: this could enable
an easy filter selection (category could be the parameter of the
filter) and enable something like
"Cascade delete the index" on category removal.

This could become a special implementation of ShardingStrategy, tobe

mandatory when using this kind of filtering?

btw, I've committed some more fixes for HSEARCH-241

kind regards,
Sanne


_______________________________________________
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev

Re: [hibernate-dev] HSearch: Using sharding and avoiding query on multiple shards

Reply via email to