Re: PySpark on Yarn a lot of python scripts project

Dimension Data, LLC. Fri, 05 Sep 2014 10:02:35 -0700

Hi:

Curious... is there any reason not to use one of the below pyspark options
(in red)? Assuming each file is, say 10k in size, is 50 files too much?
Does that touch on some practical limitation?



Usage: ./bin/pyspark [options]
Options:

--master MASTER_URL spark://host:port, mesos://host:port,yarn, or local.--deploy-mode DEPLOY_MODE Where to run the driver program: either"client" to runon the local machine, or "cluster" to runinside cluster.--class CLASS_NAME Your application's main class (for Java /Scala apps).

  --name NAME                 A name of your application.

--jars JARS Comma-separated list of local jars toinclude on the driver

                              and executor classpaths.

--py-files PY_FILES Comma-separated list of .zip, .egg, or.py files to place

                              on the PYTHONPATH for Python apps.

--files FILES Comma-separated list of files to beplaced in the working

                              directory of each executor.
[ ... snip ... ]




On 09/05/2014 12:00 PM, Davies Liu wrote:

Hi Oleg,

>
> In order to simplify the process of package and distribute you
> codes, you could deploy an shared storage (such as NFS), and put your
> project in it, mount it to all the slaves as "/projects".
>
> In the spark job scripts, you can access your project by put the
> path into sys.path, such as:
>
> import sys sys.path.append("/projects") import myproject
>
> Davies
>
> On Fri, Sep 5, 2014 at 1:28 AM, Oleg Ruchovets <oruchov...@gmail.com>
> wrote:
>> Hi , We avaluating PySpark  and successfully executed examples of
>> PySpark on Yarn.
>>
>> Next step what we want to do: We have a python project ( bunch of
>> python script using Anaconda packages). Question: What is the way
>> to execute PySpark on Yarn having a lot of python files ( ~ 50)?
>> Should it be packaged in archive? How the command to execute
>> Pyspark on Yarn with a lot of files will looks like? Currently
>> command looks like:
>>
>> ./bin/spark-submit --master yarn  --num-executors 3
>> --driver-memory 4g --executor-memory 2g --executor-cores 1
>> examples/src/main/python/wordcount.py   1000
>>
>> Thanks Oleg.
>
> ---------------------------------------------------------------------
>
>
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

For additional commands,  e-mail: user-h...@spark.apache.org

Re: PySpark on Yarn a lot of python scripts project

Reply via email to