Hi,
can somebody please explain the way FullOuterJoin works on Spark? Does each intersection get fully loaded to memory? My problem is as follows: I have two large data-sets: * a list of web pages, * a list of domain-names with specific rules for processing pages from that domain. I am joining these web-pages with processing rules. For certain domains there are millions of web-pages. Based on the memory demands the join is having it looks like the whole intersection (i.e. a domain + all corresponding pages) are kept in memory while processing. What I really need in this case, though, is to hold just the domain and iterate over all corresponding pages, one at a time. What would be the best way to do this on Spark? Thank you, Dusan Rychnovsky