Hi, How to we include the following package: https://github.com/databricks/spark-csv while starting a SPARK standalone cluster as mentioned here: http://spark.apache.org/docs/latest/spark-standalone.html
Thanks and Regards, Gourav Sengupta On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R <ramanatha...@gmail.com> wrote: > Hi Gourav, > > If your question is how to distribute python package dependencies across > the Spark cluster programmatically? ...here is an example - > > $ export > PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application' > > And in code: > > sc.addPyFile('/path/to/thrift.zip') > sc.addPyFile('/path/to/happybase.zip') > > Regards, > Ram > > > > On 15 February 2016 at 15:16, Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > >> Hi, >> >> So far no one is able to get my question at all. I know what it takes to >> load packages via SPARK shell or SPARK submit. >> >> How do I load packages when starting a SPARK cluster, as mentioned here >> http://spark.apache.org/docs/latest/spark-standalone.html ? >> >> >> Regards, >> Gourav Sengupta >> >> >> >> >> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com> >> wrote: >> >>> with conf option >>> >>> spark-submit --conf 'key = value ' >>> >>> Hope that helps you. >>> >>> On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com> >>> wrote: >>> >>>> Hi Gourav, >>>> you can use like below to load packages at the start of the spark shell. >>>> >>>> spark-shell --packages com.databricks:spark-csv_2.10:1.1.0 >>>> >>>> On 14 February 2016 at 03:34, Gourav Sengupta < >>>> gourav.sengu...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I was interested in knowing how to load the packages into SPARK >>>>> cluster started locally. Can someone pass me on the links to set the conf >>>>> file so that the packages can be loaded? >>>>> >>>>> Regards, >>>>> Gourav >>>>> >>>>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com> wrote: >>>>> >>>>>> Hello Gourav, >>>>>> >>>>>> The packages need to be loaded BEFORE you start the JVM, therefore >>>>>> you won't be able to add packages dynamically in code. You should use the >>>>>> --packages with pyspark before you start your application. >>>>>> One option is to add a `conf` that will load some packages if you are >>>>>> constantly going to use them. >>>>>> >>>>>> Best, >>>>>> Burak >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta < >>>>>> gourav.sengu...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am creating sparkcontext in a SPARK standalone cluster as >>>>>>> mentioned here: >>>>>>> http://spark.apache.org/docs/latest/spark-standalone.html using the >>>>>>> following code: >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------------------------------------------------------------------------------- >>>>>>> sc.stop() >>>>>>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , >>>>>>> False) \ >>>>>>> .setMaster("spark://hostname:7077") \ >>>>>>> .set('spark.shuffle.service.enabled', True) \ >>>>>>> .set('spark.dynamicAllocation.enabled','true') \ >>>>>>> .set('spark.executor.memory','20g') \ >>>>>>> .set('spark.driver.memory', '4g') \ >>>>>>> >>>>>>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 )) >>>>>>> conf.getAll() >>>>>>> sc = SparkContext(conf = conf) >>>>>>> >>>>>>> -----(we should definitely be able to optimise the configuration but >>>>>>> that is not the point here) --- >>>>>>> >>>>>>> I am not able to use packages, a list of which is mentioned here >>>>>>> http://spark-packages.org, using this method. >>>>>>> >>>>>>> Where as if I use the standard "pyspark --packages" option then the >>>>>>> packages load just fine. >>>>>>> >>>>>>> I will be grateful if someone could kindly let me know how to load >>>>>>> packages when starting a cluster as mentioned above. >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Gourav Sengupta >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >