Anshum wrote:
Hey Andrzej,
Could you tell me as to what research suggests this and why is it this way?
My calculation says the average load on each server would go down as I would
know what server to query for an index term as opposed to querying all
servers for terms.
I'm looking for a solution wherein I could break up the index based any
criteria and know what index to query for any input (and not query indexes
that would lead to zero results).

* Ricardo Baeza-Yates, Carlos Castillo, Flavio Junqueira, Vassilis Plachouras, Fabrizio Silvestri, 2007: Challenges on Distributed Web Retrieval: "The disadvantage of term partitioning is having to build initially the entire global index. This does not scale well, and it is not useful in actual large scale Web search engines. There are, however, some advantages of this approach in the query processing phase. Webber et al. show that term partitioning results in lower utilization of resources [49]. More specifically, it significantly reduces the number of disk accesses and the volume of data exchanged. Document partitioning however is still better in terms of throughput, because of an uneven distribution of work load in term partitioning."

* Claudine Badue, Ricardo Baeza-Yates, 2001: Distributed Query Processing Using Partitioned Inverted Files (note that their conclusion that global partitioning is more efficient than local partitioning is based on a crucial assumption of being able to distribute the load efficiently. Other papers indicate that this is a very complex issue).

* Claudine Badue, Ramurti Barbosa, Paulo Golgher: Distributed Processing of Conjunctive Queries. This paper evaluates the bottlenecks in an engine with local index partitioning.

* Justin Zobel, Alistair Moffat, 2006: Inverted Files for Text Search Engines

* Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, 2006: Mining Query Logs to Optimize Index Partitioning in Parallel Web Search Engines

* Ronny Lempel, Shlomo Moran, 2002: Optimizing Result Prefetching in Web Search Engines with Segmented Indices

... and quite a few other papers that I don't remember now ... please do a search for "distributed IR" on ACM or Citeseer.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to