Hi:
Curious... is there any reason not to use one of the below pyspark options
(in red)? Assuming each file is, say 10k in size, is 50 files too much?
Does that touch on some practical limitation?
Usage: ./bin/pyspark [options]
Options:
--master MASTER_URL spark://host:port, mesos://host:port,
yarn, or local.
--deploy-mode DEPLOY_MODE Where to run the driver program: either
"client" to run
on the local machine, or "cluster" to run
inside cluster.
--class CLASS_NAME Your application's main class (for Java /
Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to
include on the driver
and executor classpaths.
--py-files PY_FILES Comma-separated list of .zip, .egg, or
.py files to place
on the PYTHONPATH for Python apps.
--files FILES Comma-separated list of files to be
placed in the working
directory of each executor.
[ ... snip ... ]
On 09/05/2014 12:00 PM, Davies Liu wrote:
Hi Oleg,
>
> In order to simplify the process of package and distribute you
> codes, you could deploy an shared storage (such as NFS), and put your
> project in it, mount it to all the slaves as "/projects".
>
> In the spark job scripts, you can access your project by put the
> path into sys.path, such as:
>
> import sys sys.path.append("/projects") import myproject
>
> Davies
>
> On Fri, Sep 5, 2014 at 1:28 AM, Oleg Ruchovets <oruchov...@gmail.com>
> wrote:
>> Hi , We avaluating PySpark and successfully executed examples of
>> PySpark on Yarn.
>>
>> Next step what we want to do: We have a python project ( bunch of
>> python script using Anaconda packages). Question: What is the way
>> to execute PySpark on Yarn having a lot of python files ( ~ 50)?
>> Should it be packaged in archive? How the command to execute
>> Pyspark on Yarn with a lot of files will looks like? Currently
>> command looks like:
>>
>> ./bin/spark-submit --master yarn --num-executors 3
>> --driver-memory 4g --executor-memory 2g --executor-cores 1
>> examples/src/main/python/wordcount.py 1000
>>
>> Thanks Oleg.
>
> ---------------------------------------------------------------------
>
>
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
>