: Chris, can you tell me where exactly I can find those implementations you
: are talking about?
: I can't find them, probably I am searching in the wrong code-files.
:
: I would really like to compare the sourcecodes of both implementations.
I honestly don't know what you mean by "those implementations" and "both
implementations" ... impls of what?
: On the other hand, maybe my english-skillz are reasonable for my
: missunderstanding of your post.
: Maybe you mean something like "first, ask each search component for the data
: it needs. For example: get the top facets, get their counts, get the normal
: search-results, get the stats".
: And in the second step "now we know what we need, let's ask each node for
: those data and aggregate it. So we can send *one* request instead of 4 or
: more".
Let's use a concrete example ... imagine you are only dealing with
QueryComponent and FacetComponent, and imagine we have a single
"coordinatorX" server that we query, and it distributes to two distinct
shard servers ("shardA" and "shardB")
the first thing QueryComponent on the coordinatorX server cares about is
asking shardA and shardB for the docIds of the docs they have that match.
the first thing FacetComponent on coordinatorX cares about is knowing the
top facet constraints for the matching docs from shardA and shardB -- both
of those pieces of information can be computed in a single request to each
shard, in which the shard computes both pieces of information (it's top
scoring documents and it's facet constraints with the highest counts) in a
single pass. When coordinatorX gets those responses back, it's
QueryComponent can sort the "score,docId,shard" tuples to decide which
shards it needs to ask for the stored fields of which docIds in order to
build the final list of matching docs; and coordinatorX's FacetComponent
can sort the "constraint,sum(shardCounts)" to decide which constraints
should be in the final response, but since a constraint in that list
because it had a highcount from shardB might not have been in the initial
list from shardA, it needs to ask for the final count from shardA.
These subsequent pieces of info for both the QueryComponent and the
FacetComponent can be fetched from each shard in another single request,
and although they may not be computed in a single pass, we still only have
hte overhead of one network request instead of two or more.
On the otherhand, if coordinatorX just dela with shardA and shardB using
an abstractiong at the Searcher level using something like MultiSearcher,
then things like distributed faceting would require a *huge* amount of
network IO as things like using the TermEnums and TermDocs on coordinatorX
would result in all of that data being streamed from the individual
(remote) searchers for each shard so the coordinator could execute the
neccessary counting logic.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]