Re: Using SPARK packages in Spark Cluster

Gourav Sengupta Mon, 15 Feb 2016 03:15:13 -0800

Hi,

How to we include the following package:
https://github.com/databricks/spark-csv while starting a SPARK standalone
cluster as mentioned here:
http://spark.apache.org/docs/latest/spark-standalone.html




Thanks and Regards,
Gourav Sengupta

On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R <ramanatha...@gmail.com>
wrote:

> Hi Gourav,
>
> If your question is how to distribute python package dependencies across
> the Spark cluster programmatically? ...here is an example -
>
>          $ export
> PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'
>
> And in code:
>
>         sc.addPyFile('/path/to/thrift.zip')
>         sc.addPyFile('/path/to/happybase.zip')
>
> Regards,
> Ram
>
>
>
> On 15 February 2016 at 15:16, Gourav Sengupta <gourav.sengu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> So far no one is able to get my question at all. I know what it takes to
>> load packages via SPARK shell or SPARK submit.
>>
>> How do I load packages when starting a SPARK cluster, as mentioned here
>> http://spark.apache.org/docs/latest/spark-standalone.html ?
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>>
>>
>>
>> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com>
>> wrote:
>>
>>> with conf option
>>>
>>> spark-submit --conf 'key = value '
>>>
>>> Hope that helps you.
>>>
>>> On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com>
>>> wrote:
>>>
>>>> Hi Gourav,
>>>> you can use like below to load packages at the start of the spark shell.
>>>>
>>>> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0
>>>>
>>>> On 14 February 2016 at 03:34, Gourav Sengupta <
>>>> gourav.sengu...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I was interested in knowing how to load the packages into SPARK
>>>>> cluster started locally. Can someone pass me on the links to set the conf
>>>>> file so that the packages can be loaded?
>>>>>
>>>>> Regards,
>>>>> Gourav
>>>>>
>>>>> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com> wrote:
>>>>>
>>>>>> Hello Gourav,
>>>>>>
>>>>>> The packages need to be loaded BEFORE you start the JVM, therefore
>>>>>> you won't be able to add packages dynamically in code. You should use the
>>>>>> --packages with pyspark before you start your application.
>>>>>> One option is to add a `conf` that will load some packages if you are
>>>>>> constantly going to use them.
>>>>>>
>>>>>> Best,
>>>>>> Burak
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <
>>>>>> gourav.sengu...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am creating sparkcontext in a SPARK standalone cluster as
>>>>>>> mentioned here:
>>>>>>> http://spark.apache.org/docs/latest/spark-standalone.html using the
>>>>>>> following code:
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------------------------------------------------------------
>>>>>>> sc.stop()
>>>>>>> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' ,
>>>>>>> False) \
>>>>>>>                   .setMaster("spark://hostname:7077") \
>>>>>>>                   .set('spark.shuffle.service.enabled', True) \
>>>>>>>                   .set('spark.dynamicAllocation.enabled','true') \
>>>>>>>                   .set('spark.executor.memory','20g') \
>>>>>>>                   .set('spark.driver.memory', '4g') \
>>>>>>>
>>>>>>> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
>>>>>>> conf.getAll()
>>>>>>> sc = SparkContext(conf = conf)
>>>>>>>
>>>>>>> -----(we should definitely be able to optimise the configuration but
>>>>>>> that is not the point here) ---
>>>>>>>
>>>>>>> I am not able to use packages, a list of which is mentioned here
>>>>>>> http://spark-packages.org, using this method.
>>>>>>>
>>>>>>> Where as if I use the standard "pyspark --packages" option then the
>>>>>>> packages load just fine.
>>>>>>>
>>>>>>> I will be grateful if someone could kindly let me know how to load
>>>>>>> packages when starting a cluster as mentioned above.
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Gourav Sengupta
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Using SPARK packages in Spark Cluster

Reply via email to