Is there a easy way to do semi join in spark streaming? Here is my problem briefly, I have a DStream that will generate a set of values. I would like to check the existence in this set in other DStreams.
Is there a easy and standard way to model this problem. If not, can I write spark streaming job to load the set of values from disk and cache to each worker? -- Chen Song