use index, big or small?

Yang Fri, 04 May 2012 16:48:39 -0700

I have an index containing all students, now I want to do an index
search inside an Apache Hadoop mapper,
i.e.


for each (record from mapper input reader) {
    output = lucene.search("name:"+ record.name  + " OR " + " id:" +
record.id );
    emit(output)
}


my question is whether I should shard the index (across terms, not
splitting the same postings list for one term) or simply replicate it.
the index for the entire dataset is not too big, so it can fig into
my local disk, and I can copy it to every node in the cluster, and let
them sit there all the time, so no copy overhead is incurred.
the only argument in favor of sharding is that a smaller index might
be faster.  but since index search is only O(lg(n)) time, maybe this
time saving is very small.

so will sharding be worth the effort?

thanks
yang

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

use index, big or small?

Reply via email to