Hoss,
> I honestly don't know what you mean by "those implementations" and "both
> implementations" ... impls of what?
>
I mean the implementation of the distributed search in Solr. Those classes
that are responsible for the search-logic. I mean, from somewhere the
searcher (or whatever) must get the knowledge about which shards exists,
which of them to query and what their adresses are.
I want to learn more about the class, that manages this logic. Unfortunately
I don't know which class it is.
With "those" implementations I mean "MultiSearcher" and "solr's
implementation of distributed search".
Were my words not clear enough or used I the wrong vocabulary?
> Let's use a concrete example ... imagine you are only dealing with
> QueryComponent and FacetComponent, and imagine we have a single
> "coordinatorX" server that we query, and it distributes to two distinct
> shard servers ("shardA" and "shardB")
>
> the first thing QueryComponent on the coordinatorX server cares about is
> asking shardA and shardB for the docIds of the docs they have that match.
> the first thing FacetComponent on coordinatorX cares about is knowing the
> top facet constraints for the matching docs from shardA and shardB -- both
> of those pieces of information can be computed in a single request to each
> shard, in which the shard computes both pieces of information (it's top
> scoring documents and it's facet constraints with the highest counts) in a
> single pass. When coordinatorX gets those responses back, it's
> QueryComponent can sort the "score,docId,shard" tuples to decide which
> shards it needs to ask for the stored fields of which docIds in order to
> build the final list of matching docs; and coordinatorX's FacetComponent
> can sort the "constraint,sum(shardCounts)" to decide which constraints
> should be in the final response, but since a constraint in that list
> because it had a highcount from shardB might not have been in the initial
> list from shardA, it needs to ask for the final count from shardA.
>
> These subsequent pieces of info for both the QueryComponent and the
> FacetComponent can be fetched from each shard in another single request,
> and although they may not be computed in a single pass, we still only have
> hte overhead of one network request instead of two or more.
>
> On the otherhand, if coordinatorX just dela with shardA and shardB using
> an abstractiong at the Searcher level using something like MultiSearcher,
> then things like distributed faceting would require a *huge* amount of
> network IO as things like using the TermEnums and TermDocs on coordinatorX
> would result in all of that data being streamed from the individual
> (remote) searchers for each shard so the coordinator could execute the
> neccessary counting logic.
>
I honestly thought that the MultiSearcher would exactly do what you
described here. What a missunderstanding of mine.
Thanks for the clearification, Hoss.
Kind regards
- Mitch
--
View this message in context:
http://lucene.472066.n3.nabble.com/Distributed-Search-Components-tp831275p911072.html
Sent from the Solr - Dev mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]