Re: Does Lucene support partition-by-keyword indexing?

Mathieu Lecarme Sun, 02 Mar 2008 03:45:32 -0800


Le 2 mars 08 à 03:05, 仇寅 a écrit :

Hi,
I agree with your point that it is easier to partition index bydocument.But the partition-by-keyword approach has much greater scalabilityover the
partition-by-document approach. Each query involves communicating with
constant number of nodes; while partition-by-doc requires spreadingthequery a long all or many of the nodes. And I am actually doing somesmall
research on this. By the way, the documents to be indexed are not
necessarily web pages. They are mostly files stored on each node'sfile
system.
Node failures are also handled by replicas. The index for each termwill bereplicated on multiple nodes, whose nodeIDs are near to each other.This
mechanism is handled by the underlying DHT system.

So any idea how can partition index by keyword in lucene? Thanks.

When you read a file, and tokenize it, you dispatch token indifferents index, with a unique Document ID.


Can you explain more things about the context of your application?

I don't know why you need P2P. Is it for file sharing? so, indexshould be near document.

Is it for distributed computed? use central data and hadoop Map/Reduce.

If you wont a cluster of lucene for heavy querying, use the rsync + mvtrick of Technorati.If you persist with Term dispatching, use it only for caching. Eachnode provides a Term index of their Document. When you searchsomething, the parsed query gives you every Term (I can give you codefor that), you first ask wich node contains that Term, and after, yousend the Query to this nodes.


M.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Does Lucene support partition-by-keyword indexing?

Reply via email to