Hey, There is some work that started on IndexedRDD (on master I think). Meanwhile, checking what has been done in GraphX regarding vertex index in partitions could be worthwhile I guess Hth Andy Le 1 août 2014 22:50, "Philip Ogren" <philip.og...@oracle.com> a écrit :
> > Suppose I want to take my large text data input and create a distributed > inverted index in Spark on each string in the input (imagine an in-memory > lucene index - not want I'm doing but it's analogous). It seems that I > could do this with mapPartition so that each element in a partition gets > added to an index for that partition. I'm making the simplifying > assumption that the individual indexes do not need to coordinate any global > metrics so that e.g. tf-idf scores are consistent across these indexes. > Would it then be possible to take a string and query each partition's > index with it? Or better yet, take a batch of strings and query each > string in the batch against each partition's index? > > Thanks, > Philip > >