use index, big or small?

2012-05-04 Thread Yang
I have an index containing all students, now I want to do an index search inside an Apache Hadoop mapper, i.e. for each (record from mapper input reader) { output = lucene.search("name:"+ record.name + " OR " + " id:" + record.id ); emit(output) } my question is whether I should shard t

Restricting search results to a dynamic slice of documents

2012-05-04 Thread Earl Hood
I require the ability to perform a search on a dynamic slice of documents in an index. For a given event, only a select set of documents should be considered when performing a query. Looking at the API, it appears that I can use a Collector during the search to filter out any documents that do no

RE: Similarity coefficient for more exact matching

2012-05-04 Thread Paul Hill
> [use] IndexWriterConfig.setSimilarity() and > IndexSearcher.setSimilarity(), unless you are clever or like being confused. > > SweetSpotSimilarity might also be worth a look. > > -- > Ian. Being even less clever, I just make sure I set: Similarity.setDefault(new MySimilarity()) when crawl