[GitHub] spark pull request: [SPARK-1186] : Enrich the Spark Shell to suppo...

berngp Mon, 10 Mar 2014 10:45:06 -0700

GitHub user berngp reopened a pull request:

    https://github.com/apache/spark/pull/84


    [SPARK-1186] : Enrich the Spark Shell to support additional arguments.

    Enrich the Spark Shell functionality to support the following options.
    
    ```
    Usage: spark-shell [OPTIONS]
    
    OPTIONS:
    
    basic:
    
        -h  --help              : print this help information.
        -c  --executor-cores    : the maximum number of cores to be used by the 
spark shell.
        -em --executor-memory   : num[m|g], the memory used by each executor of 
spark shell.
        -dm --driver-memory     : num[m|g], the memory used by the spark shell 
and driver.
    
    soon to be deprecated:
    
        --cores         : please use -c/--executor-cores
        --drivermem     : please use -dm/--driver-memory
    
    other options:
    
        -mip --master-ip     : Spark Master IP/Host Address
        -mp  --master-port   : num, Spark Master Port
        -m   --master        : full string that describes the Spark Master.
        -ld  --local-dir     : absolute path to a local directory that will be 
use for "scratch" space in Spark.
        -dh  --driver-host   : hostname or IP address for the driver to listen 
on.
        -dp  --driver-port   : num, port for the driver to listen on.
        -uip --ui-port       : num, port for your application's dashboard, 
which shows memory and workload data.
        --parallelism        : num, default number of tasks to use across the 
cluster for distributed shuffle operations when not set by user.
        --locality-wait      : num, number of milliseconds to wait to launch a 
data-local task before giving up.
        --schedule-fair      : flag, enables FAIR scheduling between jobs 
submitted to the same SparkContext.
        --max-failures       : num, number of individual task failures before 
giving up on the job.
        --log-conf           : flag, log the supplied SparkConf as INFO at 
start of spark context.
    
    e.g.
        spark-shell -m 127.0.0.1 -ld /tmp -dh 127.0.0.1 -dp 4001 -uip 4010 
--parallelism 10 --locality-wait 500 --schedule-fair --max-failures 100
    ```
    
    [ticket: SPARK-1186] : Enrich the Spark Shell to support additional 
arguments.
                           https://spark-project.atlassian.net/browse/SPARK-1186
    
    Author      : bernardo.gomezpal...@gmail.com
    Reviewer    : ?
    Testing     : ?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/berngp/spark feature/enrich-spark-shell

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/84.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #84
    
----
commit c7ac8ebe0740d9ea7347253b251c5b6b90706b2f
Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com>
Date:   2014-03-05T23:37:30Z

    [SPARK-1186] : Enrich the Spark Shell to support additional arguments.
    
    Enrich the Spark Shell functionality to support the following options.
    
    ```
    Usage: spark-shell [OPTIONS]
    
    OPTIONS:
    
    basic:
    
        -h  --help              : print this help information.
        -c  --executor-cores    : the maximum number of cores to be used by the 
spark shell.
        -em --executor-memory   : num[m|g], the memory used by each executor of 
spark shell.
        -dm --driver-memory     : num[m|g], the memory used by the spark shell 
and driver.
    
    soon to be deprecated:
    
        --cores         : please use -c/--executor-cores
        --drivermem     : please use -dm/--driver-memory
    
    other options:
    
        -mip --master-ip     : Spark Master IP/Host Address
        -mp  --master-port   : num, Spark Master Port
        -m   --master        : full string that describes the Spark Master.
        -ld  --local-dir     : absolute path to a local directory that will be 
use for "scratch" space in Spark.
        -dh  --driver-host   : hostname or IP address for the driver to listen 
on.
        -dp  --driver-port   : num, port for the driver to listen on.
        -uip --ui-port       : num, port for your application's dashboard, 
which shows memory and workload data.
        --parallelism        : num, default number of tasks to use across the 
cluster for distributed shuffle operations when not set by user.
        --locality-wait      : num, number of milliseconds to wait to launch a 
data-local task before giving up.
        --schedule-fair      : flag, enables FAIR scheduling between jobs 
submitted to the same SparkContext.
        --max-failures       : num, number of individual task failures before 
giving up on the job.
        --log-conf           : flag, log the supplied SparkConf as INFO at 
start of spark context.
    
    e.g.
        spark-shell -m 127.0.0.1 -ld /tmp -dh 127.0.0.1 -dp 4001 -uip 4010 
--parallelism 10 --locality-wait 500 --schedule-fair --max-failures 100
    ```
    
    [ticket: SPARK-1186] : Enrich the Spark Shell to support additional 
arguments.
                           https://spark-project.atlassian.net/browse/SPARK-1186
    
    Author      : bernardo.gomezpal...@gmail.com
    Reviewer    : ?
    Testing     : ?

commit 51ca7bd7038dd5f66327d5b15692a1ccaab42129
Author: liguoqiang <liguoqi...@rd.tuan800.com>
Date:   2014-03-06T00:38:43Z

    Improve  building with maven  docs
    
         mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test
    
    to
    
         mvn -Dhadoop.version=... -Dsuites=org.apache.spark.repl.ReplSuite test
    
    Author: liguoqiang <liguoqi...@rd.tuan800.com>
    
    Closes #70 from witgo/building_with_maven and squashes the following 
commits:
    
    6ec8a54 [liguoqiang] spark.repl.ReplSuite to org.apache.spark.repl.ReplSuite

commit cda381f88cc03340fdf7b2d681699babbae2a56e
Author: Mark Grover <m...@apache.org>
Date:   2014-03-06T00:52:58Z

    SPARK-1184: Update the distribution tar.gz to include spark-assembly jar
    
    See JIRA for details.
    
    Author: Mark Grover <m...@apache.org>
    
    Closes #78 from markgrover/SPARK-1184 and squashes the following commits:
    
    12b78e6 [Mark Grover] SPARK-1184: Update the distribution tar.gz to include 
spark-assembly jar

commit 3eb009f362993dbe43028419c2d48011111a200d
Author: CodingCat <zhunans...@gmail.com>
Date:   2014-03-06T05:47:34Z

    SPARK-1156: allow user to login into a cluster without slaves
    
    Reported in https://spark-project.atlassian.net/browse/SPARK-1156
    
    The current spark-ec2 script doesn't allow user to login to a cluster 
without slaves. One of the issues brought by this behaviour is that when all 
the worker died, the user cannot even login to the cluster for debugging, etc.
    
    Author: CodingCat <zhunans...@gmail.com>
    
    Closes #58 from CodingCat/SPARK-1156 and squashes the following commits:
    
    104af07 [CodingCat] output ERROR to stderr
    9a71769 [CodingCat] do not allow user to start 0-slave cluster
    24a7c79 [CodingCat] allow user to login into a cluster without slaves

commit 3d3acef0474b6dc21f1b470ea96079a491e58b75
Author: Prabin Banka <prabin.ba...@imaginea.com>
Date:   2014-03-06T20:45:27Z

    SPARK-1187, Added missing Python APIs
    
    The following Python APIs are added,
    RDD.id()
    SparkContext.setJobGroup()
    SparkContext.setLocalProperty()
    SparkContext.getLocalProperty()
    SparkContext.sparkUser()
    
    was raised earlier as a part of  apache/incubator-spark#486
    
    Author: Prabin Banka <prabin.ba...@imaginea.com>
    
    Closes #75 from prabinb/python-api-backup and squashes the following 
commits:
    
    cc3c6cd [Prabin Banka] Added missing Python APIs

commit 40566e10aae4b21ffc71ea72702b8df118ac5c8e
Author: Kyle Ellrott <kellr...@gmail.com>
Date:   2014-03-06T22:51:00Z

    SPARK-942: Do not materialize partitions when DISK_ONLY storage level is 
used
    
    This is a port of a pull request original targeted at incubator-spark: 
https://github.com/apache/incubator-spark/pull/180
    
    Essentially if a user returns a generative iterator (from a flatMap 
operation), when trying to persist the data, Spark would first unroll the 
iterator into an ArrayBuffer, and then try to figure out if it could store the 
data. In cases where the user provided an iterator that generated more data 
then available memory, this would case a crash. With this patch, if the user 
requests a persist with a 'StorageLevel.DISK_ONLY', the iterator will be 
unrolled as it is inputed into the serializer.
    
    To do this, two changes where made:
    1) The type of the 'values' argument in the putValues method of the 
BlockStore interface was changed from ArrayBuffer to Iterator (and all code 
interfacing with this method was modified to connect correctly.
    2) The JavaSerializer now calls the ObjectOutputStream 'reset' method every 
1000 objects. This was done because the ObjectOutputStream caches objects (thus 
preventing them from being GC'd) to write more compact serialization. If reset 
is never called, eventually the memory fills up, if it is called too often then 
the serialization streams become much larger because of redundant class 
descriptions.
    
    Author: Kyle Ellrott <kellr...@gmail.com>
    
    Closes #50 from kellrott/iterator-to-disk and squashes the following 
commits:
    
    9ef7cb8 [Kyle Ellrott] Fixing formatting issues.
    60e0c57 [Kyle Ellrott] Fixing issues (formatting, variable names, etc.) 
from review comments
    8aa31cd [Kyle Ellrott] Merge ../incubator-spark into iterator-to-disk
    33ac390 [Kyle Ellrott] Merge branch 'iterator-to-disk' of 
github.com:kellrott/incubator-spark into iterator-to-disk
    2f684ea [Kyle Ellrott] Refactoring the BlockManager to replace the 
Either[Either[A,B]] usage. Now using trait 'Values'. Also modified 
BlockStore.putBytes call to return PutResult, so that it behaves like putValues.
    f70d069 [Kyle Ellrott] Adding docs for spark.serializer.objectStreamReset 
configuration
    7ccc74b [Kyle Ellrott] Moving the 'LargeIteratorSuite' to simply test 
persistance of iterators. It doesn't try to invoke a OOM error any more
    16a4cea [Kyle Ellrott] Streamlined the LargeIteratorSuite unit test. It 
should now run in ~25 seconds. Confirmed that it still crashes an unpatched 
copy of Spark.
    c2fb430 [Kyle Ellrott] Removing more un-needed array-buffer to iterator 
conversions
    627a8b7 [Kyle Ellrott] Wrapping a few long lines
    0f28ec7 [Kyle Ellrott] Adding second putValues to BlockStore interface that 
accepts an ArrayBuffer (rather then an Iterator). This will allow BlockStores 
to have slightly different behaviors dependent on whether they get an Iterator 
or ArrayBuffer. In the case of the MemoryStore, it needs to duplicate and cache 
an Iterator into an ArrayBuffer, but if handed a ArrayBuffer, it can skip the 
duplication.
    656c33e [Kyle Ellrott] Fixing the JavaSerializer to read from the SparkConf 
rather then the System property.
    8644ee8 [Kyle Ellrott] Merge branch 'master' into iterator-to-disk
    00c98e0 [Kyle Ellrott] Making the Java ObjectStreamSerializer reset rate 
configurable by the system variable 'spark.serializer.objectStreamReset', 
default is not 10000.
    40fe1d7 [Kyle Ellrott] Removing rouge space
    31fe08e [Kyle Ellrott] Removing un-needed semi-colons
    9df0276 [Kyle Ellrott] Added check to make sure that streamed-to-dist RDD 
actually returns good data in the LargeIteratorSuite
    a6424ba [Kyle Ellrott] Wrapping long line
    2eeda75 [Kyle Ellrott] Fixing dumb mistake ("||" instead of "&&")
    0e6f808 [Kyle Ellrott] Deleting temp output directory when done
    95c7f67 [Kyle Ellrott] Simplifying StorageLevel checks
    56f71cd [Kyle Ellrott] Merge branch 'master' into iterator-to-disk
    44ec35a [Kyle Ellrott] Adding some comments.
    5eb2b7e [Kyle Ellrott] Changing the JavaSerializer reset to occur every 
1000 objects.
    f403826 [Kyle Ellrott] Merge branch 'master' into iterator-to-disk
    81d670c [Kyle Ellrott] Adding unit test for straight to disk iterator 
methods.
    d32992f [Kyle Ellrott] Merge remote-tracking branch 'origin/master' into 
iterator-to-disk
    cac1fad [Kyle Ellrott] Fixing MemoryStore, so that it converts incoming 
iterators to ArrayBuffer objects. This was previously done higher up the stack.
    efe1102 [Kyle Ellrott] Changing CacheManager and BlockManager to pass 
iterators directly to the serializer when a 'DISK_ONLY' persist is called. This 
is in response to SPARK-942.

commit 7edbea41b43e0dc11a2de156be220db8b7952d01
Author: Thomas Graves <tgra...@apache.org>
Date:   2014-03-07T00:27:50Z

    SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use 
servlets
    
    resubmit pull request.  was 
https://github.com/apache/incubator-spark/pull/332.
    
    Author: Thomas Graves <tgra...@apache.org>
    
    Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and 
squashes the following commits:
    
    dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using 
runAsUser
    05eebed [Thomas Graves] Fix dependency lost in upmerge
    d1040ec [Thomas Graves] Fix up various imports
    05ff5e0 [Thomas Graves] Fix up imports after upmerging to master
    ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into 
security-branch-0.9-with-client-rebase
    13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we 
can. Switch to use sparkConf for reading config whereever possible. Added 
ConnectionManagerSuite unit tests.
    4a57acc [Thomas Graves] Change UI createHandler routines to createServlet 
since they now return servlets
    2f77147 [Thomas Graves] Rework from comments
    50dd9f2 [Thomas Graves] fix header in SecurityManager
    ecbfb65 [Thomas Graves] Fix spacing and formatting
    b514bec [Thomas Graves] Fix reference to config
    ed3d1c1 [Thomas Graves] Add security.md
    6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change 
spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things 
from review comments
    2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into 
security-branch-0.9-with-client-rebase_rework
    5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection 
changes, fix typos based on comments, and remove extra lines I missed in rebase 
from AkkaUtils
    f351763 [Thomas Graves] Add Security to Spark - Akka, Http, 
ConnectionManager, UI to use servlets

commit 328c73d037c17440c2a91a6c88b4258fbefa0c08
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-03-07T01:12:58Z

    SPARK-1197. Change yarn-standalone to yarn-cluster and fix up running on 
YARN docs
    
    This patch changes "yarn-standalone" to "yarn-cluster" (but still supports 
the former).  It also cleans up the Running on YARN docs and adds a section on 
how to view logs.
    
    Author: Sandy Ryza <sa...@cloudera.com>
    
    Closes #95 from sryza/sandy-spark-1197 and squashes the following commits:
    
    563ef3a [Sandy Ryza] Review feedback
    6ad06d4 [Sandy Ryza] Change yarn-standalone to yarn-cluster and fix up 
running on YARN docs

commit 9ae919c02f7b7d069215e8dc6cafef0ec79c9d5f
Author: anitatailor <tailor.an...@gmail.com>
Date:   2014-03-07T01:46:43Z

    Example for cassandra CQL read/write from spark
    
    Cassandra read/write using CqlPagingInputFormat/CqlOutputFormat
    
    Author: anitatailor <tailor.an...@gmail.com>
    
    Closes #87 from anitatailor/master and squashes the following commits:
    
    3493f81 [anitatailor] Fixed scala style as per review
    19480b7 [anitatailor] Example for cassandra CQL read/write from spark

commit 33baf14b04bcb5cb8dc39ae0773b9e0ef79ef9cf
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-03-07T01:57:31Z

    Small clean-up to flatmap tests

commit dabeb6f160f7ad7df1c54b1b8b069700dd4b74dd
Author: Aaron Davidson <aa...@databricks.com>
Date:   2014-03-07T18:22:27Z

    SPARK-1136: Fix FaultToleranceTest for Docker 0.8.1
    
    This patch allows the FaultToleranceTest to work in newer versions of 
Docker.
    See https://spark-project.atlassian.net/browse/SPARK-1136 for more details.
    
    Besides changing the Docker and FaultToleranceTest internals, this patch 
also changes the behavior of Master to accept new Workers which share an 
address with a Worker that we are currently trying to recover. This can only 
happen when the Worker itself was restarted and got the same IP address/port at 
the same time as a Master recovery occurs.
    
    Finally, this adds a good bit of ASCII art to the test to make failures, 
successes, and actions more apparent. This is very much needed.
    
    Author: Aaron Davidson <aa...@databricks.com>
    
    Closes #5 from aarondav/zookeeper and squashes the following commits:
    
    5d7a72a [Aaron Davidson] SPARK-1136: Fix FaultToleranceTest for Docker 0.8.1

commit b7cd9e992cbc2e649534a2cdf9b8bde2c1ee26bd
Author: Thomas Graves <tgra...@apache.org>
Date:   2014-03-07T18:36:55Z

    SPARK-1195: set map_input_file environment variable in PipedRDD
    
    Hadoop uses the config mapreduce.map.input.file to indicate the input 
filename to the map when the input split is of type FileSplit. Some of the 
hadoop input and output formats set or use this config. This config can also be 
used by user code.
    PipedRDD runs an external process and the configs aren't available to that 
process. Hadoop Streaming does something very similar and the way they make 
configs available is exporting them into the environment replacing '.' with 
'_'. Spark should also export this variable when launching the pipe command so 
the user code has access to that config.
    Note that the config mapreduce.map.input.file is the new one, the old one 
which is deprecated but not yet removed is map.input.file. So we should handle 
both.
    
    Perhaps it would be better to abstract this out somehow so it goes into the 
HadoopParition code?
    
    Author: Thomas Graves <tgra...@apache.org>
    
    Closes #94 from tgravescs/map_input_file and squashes the following commits:
    
    cc97a6a [Thomas Graves] Update test to check for existence of command, add 
a getPipeEnvVars function to HadoopRDD
    e3401dc [Thomas Graves] Merge remote-tracking branch 'upstream/master' into 
map_input_file
    2ba805e [Thomas Graves] set map_input_file environment variable in PipedRDD

commit 6e730edcde7ca6cbb5727dff7a42f7284b368528
Author: Prashant Sharma <prashan...@imaginea.com>
Date:   2014-03-08T02:48:07Z

    Spark 1165 rdd.intersection in python and java
    
    Author: Prashant Sharma <prashan...@imaginea.com>
    Author: Prashant Sharma <scrapco...@gmail.com>
    
    Closes #80 from ScrapCodes/SPARK-1165/RDD.intersection and squashes the 
following commits:
    
    9b015e9 [Prashant Sharma] Added a note, shuffle is required for 
intersection.
    1fea813 [Prashant Sharma] correct the lines wrapping
    d0c71f3 [Prashant Sharma] SPARK-1165 RDD.intersection in java
    d6effee [Prashant Sharma] SPARK-1165 Implemented RDD.intersection in python.

commit a99fb3747a0bc9498cb1d19ae5b5bb0163e6f52b
Author: Sandy Ryza <sa...@cloudera.com>
Date:   2014-03-08T07:10:35Z

    SPARK-1193. Fix indentation in pom.xmls
    
    Author: Sandy Ryza <sa...@cloudera.com>
    
    Closes #91 from sryza/sandy-spark-1193 and squashes the following commits:
    
    a878124 [Sandy Ryza] SPARK-1193. Fix indentation in pom.xmls

commit 8ad486add941c9686dfb39309adaf5b7ca66345d
Author: Reynold Xin <r...@apache.org>
Date:   2014-03-08T07:23:59Z

    Allow sbt to use more than 1G of heap.
    
    There was a mistake in sbt build file ( introduced by 
012bd5fbc97dc40bb61e0e2b9cc97ed0083f37f6 ) in which we set the default to 2048 
and the immediately reset it to 1024.
    
    Without this, building Spark can run out of permgen space on my machine.
    
    Author: Reynold Xin <r...@apache.org>
    
    Closes #103 from rxin/sbt and squashes the following commits:
    
    8829c34 [Reynold Xin] Allow sbt to use more than 1G of heap.

commit 0b7b7fd45cd9037d23cb090e62be3ff075214fe7
Author: Cheng Lian <lian.cs....@gmail.com>
Date:   2014-03-08T07:26:46Z

    [SPARK-1194] Fix the same-RDD rule for cache replacement
    
    SPARK-1194: https://spark-project.atlassian.net/browse/SPARK-1194
    
    In the current implementation, when selecting candidate blocks to be 
swapped out, once we find a block from the same RDD that the block to be stored 
belongs to, cache eviction fails  and aborts.
    
    In this PR, we keep selecting blocks *not* from the RDD that the block to 
be stored belongs to until either enough free space can be ensured (cache 
eviction succeeds) or all such blocks are checked (cache eviction fails).
    
    Author: Cheng Lian <lian.cs....@gmail.com>
    
    Closes #96 from liancheng/fix-spark-1194 and squashes the following commits:
    
    2524ab9 [Cheng Lian] Added regression test case for SPARK-1194
    6e40c22 [Cheng Lian] Remove redundant comments
    40cdcb2 [Cheng Lian] Bug fix, and addressed PR comments from @mridulm
    62c92ac [Cheng Lian] Fixed SPARK-1194 
https://spark-project.atlassian.net/browse/SPARK-1194

commit c2834ec081df392ca501a75b5af06efaa5448509
Author: Reynold Xin <r...@apache.org>
Date:   2014-03-08T20:40:26Z

    Update junitxml plugin to the latest version to avoid recompilation in 
every SBT command.
    
    Author: Reynold Xin <r...@apache.org>
    
    Closes #104 from rxin/junitxml and squashes the following commits:
    
    67ef7bf [Reynold Xin] Update junitxml plugin to the latest version to avoid 
recompilation in every SBT command.

commit e59a3b6c415b95e8137f5a154716b12653a8aed0
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-03-09T00:02:42Z

    SPARK-1190: Do not initialize log4j if slf4j log4j backend is not being used
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #107 from pwendell/logging and squashes the following commits:
    
    be21c11 [Patrick Wendell] Logging fix

commit 52834d761b059264214dfc6a1f9c70b8bc7ec089
Author: Aaron Davidson <aa...@databricks.com>
Date:   2014-03-09T18:08:39Z

    SPARK-929: Fully deprecate usage of SPARK_MEM
    
    (Continued from old repo, prior discussion at 
https://github.com/apache/incubator-spark/pull/615)
    
    This patch cements our deprecation of the SPARK_MEM environment variable by 
replacing it with three more specialized variables:
    SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY
    
    The creation of the latter two variables means that we can safely set 
driver/job memory without accidentally setting the executor memory. Neither is 
public.
    
    SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within 
SparkContext). The proper way of configuring executor memory is through the 
"spark.executor.memory" property.
    
    SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run 
by jobs launched by spark-class, without possibly affecting executor memory.
    
    Other memory considerations:
    - The repl's memory can be set through the "--drivermem" command-line 
option, which really just sets SPARK_DRIVER_MEMORY.
    - run-example doesn't use spark-class, so the only way to modify examples' 
memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally 
overriden in all cases by spark-class).
    
    This patch also fixes a lurking bug where spark-shell misused spark-class 
(the first argument is supposed to be the main class name, not java options), 
as well as a bug in the Windows spark-class2.cmd. I have not yet tested this 
patch on either Windows or Mesos, however.
    
    Author: Aaron Davidson <aa...@databricks.com>
    
    Closes #99 from aarondav/sparkmem and squashes the following commits:
    
    9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM

commit f6f9d02e85d17da2f742ed0062f1648a9293e73c
Author: Jiacheng Guo <guoj...@gmail.com>
Date:   2014-03-09T18:37:44Z

    Add timeout for fetch file
    
        Currently, when fetch a file, the connection's connect timeout
        and read timeout is based on the default jvm setting, in this change, I 
change it to
        use spark.worker.timeout. This can be usefull, when the
        connection status between worker is not perfect. And prevent
        prematurely remove task set.
    
    Author: Jiacheng Guo <guoj...@gmail.com>
    
    Closes #98 from guojc/master and squashes the following commits:
    
    abfe698 [Jiacheng Guo] add space according request
    2a37c34 [Jiacheng Guo] Add timeout for fetch file     Currently, when fetch 
a file, the connection's connect timeout     and read timeout is based on the 
default jvm setting, in this change, I change it to     use 
spark.worker.timeout. This can be usefull, when the     connection status 
between worker is not perfect. And prevent     prematurely remove task set.

commit faf4cad1debb76148facc008e0a3308ac96eee7a
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-03-09T18:57:06Z

    Fix markup errors introduced in #33 (SPARK-1189)
    
    These were causing errors on the configuration page.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #111 from pwendell/master and squashes the following commits:
    
    8467a86 [Patrick Wendell] Fix markup errors introduced in #33 (SPARK-1189)

commit b9be160951b9e7a7e801009e9d6ee6c2b5d2d47e
Author: Patrick Wendell <pwend...@gmail.com>
Date:   2014-03-09T20:17:07Z

    SPARK-782 Clean up for ASM dependency.
    
    This makes two changes.
    
    1) Spark uses the shaded version of asm that is (conveniently) published
       with Kryo.
    2) Existing exclude rules around asm are updated to reflect the new groupId
       of `org.ow2.asm`. This made all of the old rules not work with newer 
Hadoop
       versions that pull in new asm versions.
    
    Author: Patrick Wendell <pwend...@gmail.com>
    
    Closes #100 from pwendell/asm and squashes the following commits:
    
    9235f3f [Patrick Wendell] SPARK-782 Clean up for ASM dependency.

commit 5d98cfc1c8fb17fbbeacc7192ac21c0b038cbd16
Author: Chen Chao <crazy...@gmail.com>
Date:   2014-03-10T05:42:12Z

    maintain arbitrary state data for each key
    
    RT
    
    Author: Chen Chao <crazy...@gmail.com>
    
    Closes #114 from CrazyJvm/patch-1 and squashes the following commits:
    
    dcb0df5 [Chen Chao] maintain arbitrary state data for each key

commit 79a0a8c3d76704e2002800f20a4ba2ca03409cd6
Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com>
Date:   2014-03-05T23:37:30Z

    [SPARK-1186] : Enrich the Spark Shell to support additional arguments.
    
    Enrich the Spark Shell functionality to support the following options.
    
    ```
    Usage: spark-shell [OPTIONS]
    
    OPTIONS:
    
    basic:
    
        -h  --help              : print this help information.
        -c  --executor-cores    : the maximum number of cores to be used by the 
spark shell.
        -em --executor-memory   : num[m|g], the memory used by each executor of 
spark shell.
        -dm --driver-memory     : num[m|g], the memory used by the spark shell 
and driver.
    
    soon to be deprecated:
    
        --cores         : please use -c/--executor-cores
        --drivermem     : please use -dm/--driver-memory
    
    other options:
    
        -mip --master-ip     : Spark Master IP/Host Address
        -mp  --master-port   : num, Spark Master Port
        -m   --master        : full string that describes the Spark Master.
        -ld  --local-dir     : absolute path to a local directory that will be 
use for "scratch" space in Spark.
        -dh  --driver-host   : hostname or IP address for the driver to listen 
on.
        -dp  --driver-port   : num, port for the driver to listen on.
        -uip --ui-port       : num, port for your application's dashboard, 
which shows memory and workload data.
        --parallelism        : num, default number of tasks to use across the 
cluster for distributed shuffle operations when not set by user.
        --locality-wait      : num, number of milliseconds to wait to launch a 
data-local task before giving up.
        --schedule-fair      : flag, enables FAIR scheduling between jobs 
submitted to the same SparkContext.
        --max-failures       : num, number of individual task failures before 
giving up on the job.
        --log-conf           : flag, log the supplied SparkConf as INFO at 
start of spark context.
    
    e.g.
        spark-shell -m 127.0.0.1 -ld /tmp -dh 127.0.0.1 -dp 4001 -uip 4010 
--parallelism 10 --locality-wait 500 --schedule-fair --max-failures 100
    ```
    
    [ticket: SPARK-1186] : Enrich the Spark Shell to support additional 
arguments.
                           https://spark-project.atlassian.net/browse/SPARK-1186
    
    Author      : bernardo.gomezpal...@gmail.com
    Reviewer    : ?
    Testing     : ?

commit 19db38fa7e05f726d48ea783a8d51ba24d5097b2
Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com>
Date:   2014-03-10T17:23:03Z

    Option `--drivermem` isn't deprecated anymore.

commit 5c18b45c36f22c5516179bb2c3f93827e84df3af
Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com>
Date:   2014-03-10T17:37:40Z

    Merge remote-tracking branch 'origin/feature/enrich-spark-shell' into 
feature/enrich-spark-shell
    
    Conflicts:
        bin/spark-shell

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1186] : Enrich the Spark Shell to suppo...

Reply via email to