Hi, I want to use pySpark with yarn. But documentation doesn't give me full understanding on what's going on, and I simply don't understand code. So:
1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in "map" function - is it shipped to cluster and just executed on it? How it understand all dependencies, which my code need and ship it there? If I use Math in my code in "map" does it mean, that I would ship Math class or some python Math on cluster would be used? 3) I have c++ compiled code. Can I ship this executable with "addPyFile" and just use "exec" function from python? Would it work? -- *Sincerely yoursEgor PakhomovScala Developer, Yandex*