GitHub user berngp reopened a pull request: https://github.com/apache/spark/pull/84
[SPARK-1186] : Enrich the Spark Shell to support additional arguments. Enrich the Spark Shell functionality to support the following options. ``` Usage: spark-shell [OPTIONS] OPTIONS: basic: -h --help : print this help information. -c --executor-cores : the maximum number of cores to be used by the spark shell. -em --executor-memory : num[m|g], the memory used by each executor of spark shell. -dm --driver-memory : num[m|g], the memory used by the spark shell and driver. soon to be deprecated: --cores : please use -c/--executor-cores --drivermem : please use -dm/--driver-memory other options: -mip --master-ip : Spark Master IP/Host Address -mp --master-port : num, Spark Master Port -m --master : full string that describes the Spark Master. -ld --local-dir : absolute path to a local directory that will be use for "scratch" space in Spark. -dh --driver-host : hostname or IP address for the driver to listen on. -dp --driver-port : num, port for the driver to listen on. -uip --ui-port : num, port for your application's dashboard, which shows memory and workload data. --parallelism : num, default number of tasks to use across the cluster for distributed shuffle operations when not set by user. --locality-wait : num, number of milliseconds to wait to launch a data-local task before giving up. --schedule-fair : flag, enables FAIR scheduling between jobs submitted to the same SparkContext. --max-failures : num, number of individual task failures before giving up on the job. --log-conf : flag, log the supplied SparkConf as INFO at start of spark context. e.g. spark-shell -m 127.0.0.1 -ld /tmp -dh 127.0.0.1 -dp 4001 -uip 4010 --parallelism 10 --locality-wait 500 --schedule-fair --max-failures 100 ``` [ticket: SPARK-1186] : Enrich the Spark Shell to support additional arguments. https://spark-project.atlassian.net/browse/SPARK-1186 Author : bernardo.gomezpal...@gmail.com Reviewer : ? Testing : ? You can merge this pull request into a Git repository by running: $ git pull https://github.com/berngp/spark feature/enrich-spark-shell Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/84.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #84 ---- commit c7ac8ebe0740d9ea7347253b251c5b6b90706b2f Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-03-05T23:37:30Z [SPARK-1186] : Enrich the Spark Shell to support additional arguments. Enrich the Spark Shell functionality to support the following options. ``` Usage: spark-shell [OPTIONS] OPTIONS: basic: -h --help : print this help information. -c --executor-cores : the maximum number of cores to be used by the spark shell. -em --executor-memory : num[m|g], the memory used by each executor of spark shell. -dm --driver-memory : num[m|g], the memory used by the spark shell and driver. soon to be deprecated: --cores : please use -c/--executor-cores --drivermem : please use -dm/--driver-memory other options: -mip --master-ip : Spark Master IP/Host Address -mp --master-port : num, Spark Master Port -m --master : full string that describes the Spark Master. -ld --local-dir : absolute path to a local directory that will be use for "scratch" space in Spark. -dh --driver-host : hostname or IP address for the driver to listen on. -dp --driver-port : num, port for the driver to listen on. -uip --ui-port : num, port for your application's dashboard, which shows memory and workload data. --parallelism : num, default number of tasks to use across the cluster for distributed shuffle operations when not set by user. --locality-wait : num, number of milliseconds to wait to launch a data-local task before giving up. --schedule-fair : flag, enables FAIR scheduling between jobs submitted to the same SparkContext. --max-failures : num, number of individual task failures before giving up on the job. --log-conf : flag, log the supplied SparkConf as INFO at start of spark context. e.g. spark-shell -m 127.0.0.1 -ld /tmp -dh 127.0.0.1 -dp 4001 -uip 4010 --parallelism 10 --locality-wait 500 --schedule-fair --max-failures 100 ``` [ticket: SPARK-1186] : Enrich the Spark Shell to support additional arguments. https://spark-project.atlassian.net/browse/SPARK-1186 Author : bernardo.gomezpal...@gmail.com Reviewer : ? Testing : ? commit 51ca7bd7038dd5f66327d5b15692a1ccaab42129 Author: liguoqiang <liguoqi...@rd.tuan800.com> Date: 2014-03-06T00:38:43Z Improve building with maven docs mvn -Dhadoop.version=... -Dsuites=spark.repl.ReplSuite test to mvn -Dhadoop.version=... -Dsuites=org.apache.spark.repl.ReplSuite test Author: liguoqiang <liguoqi...@rd.tuan800.com> Closes #70 from witgo/building_with_maven and squashes the following commits: 6ec8a54 [liguoqiang] spark.repl.ReplSuite to org.apache.spark.repl.ReplSuite commit cda381f88cc03340fdf7b2d681699babbae2a56e Author: Mark Grover <m...@apache.org> Date: 2014-03-06T00:52:58Z SPARK-1184: Update the distribution tar.gz to include spark-assembly jar See JIRA for details. Author: Mark Grover <m...@apache.org> Closes #78 from markgrover/SPARK-1184 and squashes the following commits: 12b78e6 [Mark Grover] SPARK-1184: Update the distribution tar.gz to include spark-assembly jar commit 3eb009f362993dbe43028419c2d48011111a200d Author: CodingCat <zhunans...@gmail.com> Date: 2014-03-06T05:47:34Z SPARK-1156: allow user to login into a cluster without slaves Reported in https://spark-project.atlassian.net/browse/SPARK-1156 The current spark-ec2 script doesn't allow user to login to a cluster without slaves. One of the issues brought by this behaviour is that when all the worker died, the user cannot even login to the cluster for debugging, etc. Author: CodingCat <zhunans...@gmail.com> Closes #58 from CodingCat/SPARK-1156 and squashes the following commits: 104af07 [CodingCat] output ERROR to stderr 9a71769 [CodingCat] do not allow user to start 0-slave cluster 24a7c79 [CodingCat] allow user to login into a cluster without slaves commit 3d3acef0474b6dc21f1b470ea96079a491e58b75 Author: Prabin Banka <prabin.ba...@imaginea.com> Date: 2014-03-06T20:45:27Z SPARK-1187, Added missing Python APIs The following Python APIs are added, RDD.id() SparkContext.setJobGroup() SparkContext.setLocalProperty() SparkContext.getLocalProperty() SparkContext.sparkUser() was raised earlier as a part of apache/incubator-spark#486 Author: Prabin Banka <prabin.ba...@imaginea.com> Closes #75 from prabinb/python-api-backup and squashes the following commits: cc3c6cd [Prabin Banka] Added missing Python APIs commit 40566e10aae4b21ffc71ea72702b8df118ac5c8e Author: Kyle Ellrott <kellr...@gmail.com> Date: 2014-03-06T22:51:00Z SPARK-942: Do not materialize partitions when DISK_ONLY storage level is used This is a port of a pull request original targeted at incubator-spark: https://github.com/apache/incubator-spark/pull/180 Essentially if a user returns a generative iterator (from a flatMap operation), when trying to persist the data, Spark would first unroll the iterator into an ArrayBuffer, and then try to figure out if it could store the data. In cases where the user provided an iterator that generated more data then available memory, this would case a crash. With this patch, if the user requests a persist with a 'StorageLevel.DISK_ONLY', the iterator will be unrolled as it is inputed into the serializer. To do this, two changes where made: 1) The type of the 'values' argument in the putValues method of the BlockStore interface was changed from ArrayBuffer to Iterator (and all code interfacing with this method was modified to connect correctly. 2) The JavaSerializer now calls the ObjectOutputStream 'reset' method every 1000 objects. This was done because the ObjectOutputStream caches objects (thus preventing them from being GC'd) to write more compact serialization. If reset is never called, eventually the memory fills up, if it is called too often then the serialization streams become much larger because of redundant class descriptions. Author: Kyle Ellrott <kellr...@gmail.com> Closes #50 from kellrott/iterator-to-disk and squashes the following commits: 9ef7cb8 [Kyle Ellrott] Fixing formatting issues. 60e0c57 [Kyle Ellrott] Fixing issues (formatting, variable names, etc.) from review comments 8aa31cd [Kyle Ellrott] Merge ../incubator-spark into iterator-to-disk 33ac390 [Kyle Ellrott] Merge branch 'iterator-to-disk' of github.com:kellrott/incubator-spark into iterator-to-disk 2f684ea [Kyle Ellrott] Refactoring the BlockManager to replace the Either[Either[A,B]] usage. Now using trait 'Values'. Also modified BlockStore.putBytes call to return PutResult, so that it behaves like putValues. f70d069 [Kyle Ellrott] Adding docs for spark.serializer.objectStreamReset configuration 7ccc74b [Kyle Ellrott] Moving the 'LargeIteratorSuite' to simply test persistance of iterators. It doesn't try to invoke a OOM error any more 16a4cea [Kyle Ellrott] Streamlined the LargeIteratorSuite unit test. It should now run in ~25 seconds. Confirmed that it still crashes an unpatched copy of Spark. c2fb430 [Kyle Ellrott] Removing more un-needed array-buffer to iterator conversions 627a8b7 [Kyle Ellrott] Wrapping a few long lines 0f28ec7 [Kyle Ellrott] Adding second putValues to BlockStore interface that accepts an ArrayBuffer (rather then an Iterator). This will allow BlockStores to have slightly different behaviors dependent on whether they get an Iterator or ArrayBuffer. In the case of the MemoryStore, it needs to duplicate and cache an Iterator into an ArrayBuffer, but if handed a ArrayBuffer, it can skip the duplication. 656c33e [Kyle Ellrott] Fixing the JavaSerializer to read from the SparkConf rather then the System property. 8644ee8 [Kyle Ellrott] Merge branch 'master' into iterator-to-disk 00c98e0 [Kyle Ellrott] Making the Java ObjectStreamSerializer reset rate configurable by the system variable 'spark.serializer.objectStreamReset', default is not 10000. 40fe1d7 [Kyle Ellrott] Removing rouge space 31fe08e [Kyle Ellrott] Removing un-needed semi-colons 9df0276 [Kyle Ellrott] Added check to make sure that streamed-to-dist RDD actually returns good data in the LargeIteratorSuite a6424ba [Kyle Ellrott] Wrapping long line 2eeda75 [Kyle Ellrott] Fixing dumb mistake ("||" instead of "&&") 0e6f808 [Kyle Ellrott] Deleting temp output directory when done 95c7f67 [Kyle Ellrott] Simplifying StorageLevel checks 56f71cd [Kyle Ellrott] Merge branch 'master' into iterator-to-disk 44ec35a [Kyle Ellrott] Adding some comments. 5eb2b7e [Kyle Ellrott] Changing the JavaSerializer reset to occur every 1000 objects. f403826 [Kyle Ellrott] Merge branch 'master' into iterator-to-disk 81d670c [Kyle Ellrott] Adding unit test for straight to disk iterator methods. d32992f [Kyle Ellrott] Merge remote-tracking branch 'origin/master' into iterator-to-disk cac1fad [Kyle Ellrott] Fixing MemoryStore, so that it converts incoming iterators to ArrayBuffer objects. This was previously done higher up the stack. efe1102 [Kyle Ellrott] Changing CacheManager and BlockManager to pass iterators directly to the serializer when a 'DISK_ONLY' persist is called. This is in response to SPARK-942. commit 7edbea41b43e0dc11a2de156be220db8b7952d01 Author: Thomas Graves <tgra...@apache.org> Date: 2014-03-07T00:27:50Z SPARK-1189: Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. Author: Thomas Graves <tgra...@apache.org> Closes #33 from tgravescs/security-branch-0.9-with-client-rebase and squashes the following commits: dfe3918 [Thomas Graves] Fix merge conflict since startUserClass now using runAsUser 05eebed [Thomas Graves] Fix dependency lost in upmerge d1040ec [Thomas Graves] Fix up various imports 05ff5e0 [Thomas Graves] Fix up imports after upmerging to master ac046b3 [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase 13733e1 [Thomas Graves] Pass securityManager and SparkConf around where we can. Switch to use sparkConf for reading config whereever possible. Added ConnectionManagerSuite unit tests. 4a57acc [Thomas Graves] Change UI createHandler routines to createServlet since they now return servlets 2f77147 [Thomas Graves] Rework from comments 50dd9f2 [Thomas Graves] fix header in SecurityManager ecbfb65 [Thomas Graves] Fix spacing and formatting b514bec [Thomas Graves] Fix reference to config ed3d1c1 [Thomas Graves] Add security.md 6f7ddf3 [Thomas Graves] Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments 2d9e23e [Thomas Graves] Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework 5721c5a [Thomas Graves] update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils f351763 [Thomas Graves] Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets commit 328c73d037c17440c2a91a6c88b4258fbefa0c08 Author: Sandy Ryza <sa...@cloudera.com> Date: 2014-03-07T01:12:58Z SPARK-1197. Change yarn-standalone to yarn-cluster and fix up running on YARN docs This patch changes "yarn-standalone" to "yarn-cluster" (but still supports the former). It also cleans up the Running on YARN docs and adds a section on how to view logs. Author: Sandy Ryza <sa...@cloudera.com> Closes #95 from sryza/sandy-spark-1197 and squashes the following commits: 563ef3a [Sandy Ryza] Review feedback 6ad06d4 [Sandy Ryza] Change yarn-standalone to yarn-cluster and fix up running on YARN docs commit 9ae919c02f7b7d069215e8dc6cafef0ec79c9d5f Author: anitatailor <tailor.an...@gmail.com> Date: 2014-03-07T01:46:43Z Example for cassandra CQL read/write from spark Cassandra read/write using CqlPagingInputFormat/CqlOutputFormat Author: anitatailor <tailor.an...@gmail.com> Closes #87 from anitatailor/master and squashes the following commits: 3493f81 [anitatailor] Fixed scala style as per review 19480b7 [anitatailor] Example for cassandra CQL read/write from spark commit 33baf14b04bcb5cb8dc39ae0773b9e0ef79ef9cf Author: Patrick Wendell <pwend...@gmail.com> Date: 2014-03-07T01:57:31Z Small clean-up to flatmap tests commit dabeb6f160f7ad7df1c54b1b8b069700dd4b74dd Author: Aaron Davidson <aa...@databricks.com> Date: 2014-03-07T18:22:27Z SPARK-1136: Fix FaultToleranceTest for Docker 0.8.1 This patch allows the FaultToleranceTest to work in newer versions of Docker. See https://spark-project.atlassian.net/browse/SPARK-1136 for more details. Besides changing the Docker and FaultToleranceTest internals, this patch also changes the behavior of Master to accept new Workers which share an address with a Worker that we are currently trying to recover. This can only happen when the Worker itself was restarted and got the same IP address/port at the same time as a Master recovery occurs. Finally, this adds a good bit of ASCII art to the test to make failures, successes, and actions more apparent. This is very much needed. Author: Aaron Davidson <aa...@databricks.com> Closes #5 from aarondav/zookeeper and squashes the following commits: 5d7a72a [Aaron Davidson] SPARK-1136: Fix FaultToleranceTest for Docker 0.8.1 commit b7cd9e992cbc2e649534a2cdf9b8bde2c1ee26bd Author: Thomas Graves <tgra...@apache.org> Date: 2014-03-07T18:36:55Z SPARK-1195: set map_input_file environment variable in PipedRDD Hadoop uses the config mapreduce.map.input.file to indicate the input filename to the map when the input split is of type FileSplit. Some of the hadoop input and output formats set or use this config. This config can also be used by user code. PipedRDD runs an external process and the configs aren't available to that process. Hadoop Streaming does something very similar and the way they make configs available is exporting them into the environment replacing '.' with '_'. Spark should also export this variable when launching the pipe command so the user code has access to that config. Note that the config mapreduce.map.input.file is the new one, the old one which is deprecated but not yet removed is map.input.file. So we should handle both. Perhaps it would be better to abstract this out somehow so it goes into the HadoopParition code? Author: Thomas Graves <tgra...@apache.org> Closes #94 from tgravescs/map_input_file and squashes the following commits: cc97a6a [Thomas Graves] Update test to check for existence of command, add a getPipeEnvVars function to HadoopRDD e3401dc [Thomas Graves] Merge remote-tracking branch 'upstream/master' into map_input_file 2ba805e [Thomas Graves] set map_input_file environment variable in PipedRDD commit 6e730edcde7ca6cbb5727dff7a42f7284b368528 Author: Prashant Sharma <prashan...@imaginea.com> Date: 2014-03-08T02:48:07Z Spark 1165 rdd.intersection in python and java Author: Prashant Sharma <prashan...@imaginea.com> Author: Prashant Sharma <scrapco...@gmail.com> Closes #80 from ScrapCodes/SPARK-1165/RDD.intersection and squashes the following commits: 9b015e9 [Prashant Sharma] Added a note, shuffle is required for intersection. 1fea813 [Prashant Sharma] correct the lines wrapping d0c71f3 [Prashant Sharma] SPARK-1165 RDD.intersection in java d6effee [Prashant Sharma] SPARK-1165 Implemented RDD.intersection in python. commit a99fb3747a0bc9498cb1d19ae5b5bb0163e6f52b Author: Sandy Ryza <sa...@cloudera.com> Date: 2014-03-08T07:10:35Z SPARK-1193. Fix indentation in pom.xmls Author: Sandy Ryza <sa...@cloudera.com> Closes #91 from sryza/sandy-spark-1193 and squashes the following commits: a878124 [Sandy Ryza] SPARK-1193. Fix indentation in pom.xmls commit 8ad486add941c9686dfb39309adaf5b7ca66345d Author: Reynold Xin <r...@apache.org> Date: 2014-03-08T07:23:59Z Allow sbt to use more than 1G of heap. There was a mistake in sbt build file ( introduced by 012bd5fbc97dc40bb61e0e2b9cc97ed0083f37f6 ) in which we set the default to 2048 and the immediately reset it to 1024. Without this, building Spark can run out of permgen space on my machine. Author: Reynold Xin <r...@apache.org> Closes #103 from rxin/sbt and squashes the following commits: 8829c34 [Reynold Xin] Allow sbt to use more than 1G of heap. commit 0b7b7fd45cd9037d23cb090e62be3ff075214fe7 Author: Cheng Lian <lian.cs....@gmail.com> Date: 2014-03-08T07:26:46Z [SPARK-1194] Fix the same-RDD rule for cache replacement SPARK-1194: https://spark-project.atlassian.net/browse/SPARK-1194 In the current implementation, when selecting candidate blocks to be swapped out, once we find a block from the same RDD that the block to be stored belongs to, cache eviction fails and aborts. In this PR, we keep selecting blocks *not* from the RDD that the block to be stored belongs to until either enough free space can be ensured (cache eviction succeeds) or all such blocks are checked (cache eviction fails). Author: Cheng Lian <lian.cs....@gmail.com> Closes #96 from liancheng/fix-spark-1194 and squashes the following commits: 2524ab9 [Cheng Lian] Added regression test case for SPARK-1194 6e40c22 [Cheng Lian] Remove redundant comments 40cdcb2 [Cheng Lian] Bug fix, and addressed PR comments from @mridulm 62c92ac [Cheng Lian] Fixed SPARK-1194 https://spark-project.atlassian.net/browse/SPARK-1194 commit c2834ec081df392ca501a75b5af06efaa5448509 Author: Reynold Xin <r...@apache.org> Date: 2014-03-08T20:40:26Z Update junitxml plugin to the latest version to avoid recompilation in every SBT command. Author: Reynold Xin <r...@apache.org> Closes #104 from rxin/junitxml and squashes the following commits: 67ef7bf [Reynold Xin] Update junitxml plugin to the latest version to avoid recompilation in every SBT command. commit e59a3b6c415b95e8137f5a154716b12653a8aed0 Author: Patrick Wendell <pwend...@gmail.com> Date: 2014-03-09T00:02:42Z SPARK-1190: Do not initialize log4j if slf4j log4j backend is not being used Author: Patrick Wendell <pwend...@gmail.com> Closes #107 from pwendell/logging and squashes the following commits: be21c11 [Patrick Wendell] Logging fix commit 52834d761b059264214dfc6a1f9c70b8bc7ec089 Author: Aaron Davidson <aa...@databricks.com> Date: 2014-03-09T18:08:39Z SPARK-929: Fully deprecate usage of SPARK_MEM (Continued from old repo, prior discussion at https://github.com/apache/incubator-spark/pull/615) This patch cements our deprecation of the SPARK_MEM environment variable by replacing it with three more specialized variables: SPARK_DAEMON_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_DRIVER_MEMORY The creation of the latter two variables means that we can safely set driver/job memory without accidentally setting the executor memory. Neither is public. SPARK_EXECUTOR_MEMORY is only used by the Mesos scheduler (and set within SparkContext). The proper way of configuring executor memory is through the "spark.executor.memory" property. SPARK_DRIVER_MEMORY is the new way of specifying the amount of memory run by jobs launched by spark-class, without possibly affecting executor memory. Other memory considerations: - The repl's memory can be set through the "--drivermem" command-line option, which really just sets SPARK_DRIVER_MEMORY. - run-example doesn't use spark-class, so the only way to modify examples' memory is actually an unusual use of SPARK_JAVA_OPTS (which is normally overriden in all cases by spark-class). This patch also fixes a lurking bug where spark-shell misused spark-class (the first argument is supposed to be the main class name, not java options), as well as a bug in the Windows spark-class2.cmd. I have not yet tested this patch on either Windows or Mesos, however. Author: Aaron Davidson <aa...@databricks.com> Closes #99 from aarondav/sparkmem and squashes the following commits: 9df4c68 [Aaron Davidson] SPARK-929: Fully deprecate usage of SPARK_MEM commit f6f9d02e85d17da2f742ed0062f1648a9293e73c Author: Jiacheng Guo <guoj...@gmail.com> Date: 2014-03-09T18:37:44Z Add timeout for fetch file Currently, when fetch a file, the connection's connect timeout and read timeout is based on the default jvm setting, in this change, I change it to use spark.worker.timeout. This can be usefull, when the connection status between worker is not perfect. And prevent prematurely remove task set. Author: Jiacheng Guo <guoj...@gmail.com> Closes #98 from guojc/master and squashes the following commits: abfe698 [Jiacheng Guo] add space according request 2a37c34 [Jiacheng Guo] Add timeout for fetch file Currently, when fetch a file, the connection's connect timeout and read timeout is based on the default jvm setting, in this change, I change it to use spark.worker.timeout. This can be usefull, when the connection status between worker is not perfect. And prevent prematurely remove task set. commit faf4cad1debb76148facc008e0a3308ac96eee7a Author: Patrick Wendell <pwend...@gmail.com> Date: 2014-03-09T18:57:06Z Fix markup errors introduced in #33 (SPARK-1189) These were causing errors on the configuration page. Author: Patrick Wendell <pwend...@gmail.com> Closes #111 from pwendell/master and squashes the following commits: 8467a86 [Patrick Wendell] Fix markup errors introduced in #33 (SPARK-1189) commit b9be160951b9e7a7e801009e9d6ee6c2b5d2d47e Author: Patrick Wendell <pwend...@gmail.com> Date: 2014-03-09T20:17:07Z SPARK-782 Clean up for ASM dependency. This makes two changes. 1) Spark uses the shaded version of asm that is (conveniently) published with Kryo. 2) Existing exclude rules around asm are updated to reflect the new groupId of `org.ow2.asm`. This made all of the old rules not work with newer Hadoop versions that pull in new asm versions. Author: Patrick Wendell <pwend...@gmail.com> Closes #100 from pwendell/asm and squashes the following commits: 9235f3f [Patrick Wendell] SPARK-782 Clean up for ASM dependency. commit 5d98cfc1c8fb17fbbeacc7192ac21c0b038cbd16 Author: Chen Chao <crazy...@gmail.com> Date: 2014-03-10T05:42:12Z maintain arbitrary state data for each key RT Author: Chen Chao <crazy...@gmail.com> Closes #114 from CrazyJvm/patch-1 and squashes the following commits: dcb0df5 [Chen Chao] maintain arbitrary state data for each key commit 79a0a8c3d76704e2002800f20a4ba2ca03409cd6 Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-03-05T23:37:30Z [SPARK-1186] : Enrich the Spark Shell to support additional arguments. Enrich the Spark Shell functionality to support the following options. ``` Usage: spark-shell [OPTIONS] OPTIONS: basic: -h --help : print this help information. -c --executor-cores : the maximum number of cores to be used by the spark shell. -em --executor-memory : num[m|g], the memory used by each executor of spark shell. -dm --driver-memory : num[m|g], the memory used by the spark shell and driver. soon to be deprecated: --cores : please use -c/--executor-cores --drivermem : please use -dm/--driver-memory other options: -mip --master-ip : Spark Master IP/Host Address -mp --master-port : num, Spark Master Port -m --master : full string that describes the Spark Master. -ld --local-dir : absolute path to a local directory that will be use for "scratch" space in Spark. -dh --driver-host : hostname or IP address for the driver to listen on. -dp --driver-port : num, port for the driver to listen on. -uip --ui-port : num, port for your application's dashboard, which shows memory and workload data. --parallelism : num, default number of tasks to use across the cluster for distributed shuffle operations when not set by user. --locality-wait : num, number of milliseconds to wait to launch a data-local task before giving up. --schedule-fair : flag, enables FAIR scheduling between jobs submitted to the same SparkContext. --max-failures : num, number of individual task failures before giving up on the job. --log-conf : flag, log the supplied SparkConf as INFO at start of spark context. e.g. spark-shell -m 127.0.0.1 -ld /tmp -dh 127.0.0.1 -dp 4001 -uip 4010 --parallelism 10 --locality-wait 500 --schedule-fair --max-failures 100 ``` [ticket: SPARK-1186] : Enrich the Spark Shell to support additional arguments. https://spark-project.atlassian.net/browse/SPARK-1186 Author : bernardo.gomezpal...@gmail.com Reviewer : ? Testing : ? commit 19db38fa7e05f726d48ea783a8d51ba24d5097b2 Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-03-10T17:23:03Z Option `--drivermem` isn't deprecated anymore. commit 5c18b45c36f22c5516179bb2c3f93827e84df3af Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-03-10T17:37:40Z Merge remote-tracking branch 'origin/feature/enrich-spark-shell' into feature/enrich-spark-shell Conflicts: bin/spark-shell ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---