creating a distributed index

Philip Ogren Fri, 01 Aug 2014 13:51:36 -0700

Suppose I want to take my large text data input and create a distributedinverted index in Spark on each string in the input (imagine anin-memory lucene index - not want I'm doing but it's analogous). Itseems that I could do this with mapPartition so that each element in apartition gets added to an index for that partition. I'm making thesimplifying assumption that the individual indexes do not need tocoordinate any global metrics so that e.g. tf-idf scores are consistentacross these indexes. Would it then be possible to take a string andquery each partition's index with it? Or better yet, take a batch ofstrings and query each string in the batch against each partition's index?


Thanks,
Philip

creating a distributed index

Reply via email to