I am running into an inconvenience while developing which I think could be fixed by extending addPyFile. I am working on a pyspark project which has a primary entry point plus several modules. The effect of this is that unless the code is copied to the cluster in the PYTHONPATH or zipped and shipped with addPyFile, it will return an error.
This would be easy if it were a single file or zip file with addPyFile, but it is inconvenient while developing since it would mean reziping every time a change was made. Would it be a good idea to allow addPyFile to support adding local directories? Does this seem like a good idea? The implementation would be to have python zip the given directory into a tmp directory, then ship that to the cluster. -- Pedro Rodriguez CU Boulder Phd Student UCBerkeley 2014 | Computer Science