Hi, I want to do some processing with PySpark and save the results in a variable of type tuple that should be shared among the executors for further processing. Actually, it's a Text Mining Processing and I want to use the Vector Space Model. So I want to calculate the Vector of all Words (that should be reachable for all executors) and save it in a tuple. Is it possible in Spark or I should use external storage like database or files?
- Use Shared Variable in PySpark Executors Soheil Pourbafrani
- Re: Use Shared Variable in PySpark Executors Jörn Franke
- Re: Use Shared Variable in PySpark Executors Soheil Pourbafrani