Hey,
There is some work that started on IndexedRDD (on master I think).
Meanwhile, checking what has been done in GraphX regarding vertex index in
partitions could be worthwhile I guess
Hth
Andy
Le 1 août 2014 22:50, "Philip Ogren" <philip.og...@oracle.com> a écrit :

>
> Suppose I want to take my large text data input and create a distributed
> inverted index in Spark on each string in the input (imagine an in-memory
> lucene index - not want I'm doing but it's analogous).  It seems that I
> could do this with mapPartition so that each element in a partition gets
> added to an index for that partition.  I'm making the simplifying
> assumption that the individual indexes do not need to coordinate any global
> metrics so that e.g. tf-idf scores are consistent across these indexes.
>  Would it then be possible to take a string and query each partition's
> index with it?  Or better yet, take a batch of strings and query each
> string in the batch against each partition's index?
>
> Thanks,
> Philip
>
>

Reply via email to