Hi, I am creating sparkcontext in a SPARK standalone cluster as mentioned here: http://spark.apache.org/docs/latest/spark-standalone.html using the following code:
-------------------------------------------------------------------------------------------------------------------------- sc.stop() conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \ .setMaster("spark://hostname:7077") \ .set('spark.shuffle.service.enabled', True) \ .set('spark.dynamicAllocation.enabled','true') \ .set('spark.executor.memory','20g') \ .set('spark.driver.memory', '4g') \ .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 )) conf.getAll() sc = SparkContext(conf = conf) -----(we should definitely be able to optimise the configuration but that is not the point here) --- I am not able to use packages, a list of which is mentioned here http://spark-packages.org, using this method. Where as if I use the standard "pyspark --packages" option then the packages load just fine. I will be grateful if someone could kindly let me know how to load packages when starting a cluster as mentioned above. Regards, Gourav Sengupta