[GitHub] spark pull request: SPARK-1121: Use 'avro' profile in Maven.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/49#issuecomment-36398886 I'll give it a try. Any reason we don't just tie this to the yarn-alpha profile? Or does it not apply to the hadoop 2.0.2 type builds? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10215150 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.net.{Authenticator, PasswordAuthentication} +import org.apache.hadoop.io.Text +import org.apache.hadoop.security.Credentials +import org.apache.hadoop.security.UserGroupInformation +import org.apache.spark.deploy.SparkHadoopUtil + +import scala.collection.mutable.ArrayBuffer + +/** + * Spark class responsible for security. + * + * In general this class should be instantiated by the SparkEnv and most components + * should access it from that. There are some cases where the SparkEnv hasn't been + * initialized yet and this class must be instantiated directly. + * + * Spark currently supports authentication via a shared secret. + * Authentication can be configured to be on via the 'spark.authenticate' configuration + * parameter. This parameter controls whether the Spark communication protocols do + * authentication using the shared secret. This authentication is a basic handshake to + * make sure both sides have the same shared secret and are allowed to communicate. + * If the shared secret is not identical they will not be allowed to communicate. + * + * The Spark UI can also be secured by using javax servlet filters. A user may want to + * secure the UI if it has data that other users should not be allowed to see. The javax + * servlet filter specified by the user can authenticate the user and then once the user + * is logged in, Spark can compare that user versus the view acls to make sure they are + * authorized to view the UI. The configs 'spark.ui.acls.enable' and 'spark.ui.view.acls' + * control the behavior of the acls. Note that the person who started the application + * always has view access to the UI. + * + * Spark does not currently support encryption after authentication. + * + * At this point spark has multiple communication protocols that need to be secured and + * different underlying mechisms are used depending on the protocol: + * + * - Akka -> The only option here is to use the Akka Remote secure-cookie functionality. + *Akka remoting allows you to specify a secure cookie that will be exchanged + *and ensured to be identical in the connection handshake between the client + *and the server. If they are not identical then the client will be refused + *to connect to the server. There is no control of the underlying + *authentication mechanism so its not clear if the password is passed in + *plaintext or uses DIGEST-MD5 or some other mechanism. + *Akka also has an option to turn on SSL, this option is not currently supported + *but we could add a configuration option in the future. + * + * - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty + *for the HttpServer. Jetty supports multiple authentication mechanisms - + *Basic, Digest, Form, Spengo, etc. It also supports multiple different login + *services - Hash, JAAS, Spnego, JDBC, etc. Spark currently uses the HashLoginService + *to authenticate using DIGEST-MD5 via a single user and the shared secret. + *Since we are using DIGEST-MD5, the shared secret is not passed on the wire + *in plaintext. + *We currently do not support SSL (https), but Jetty can be configured to use it + *so we could add a configuration option for this in the future. + * + *The Spark HttpServer installs the HashLoginServer and configures it t
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10215322 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.net.{Authenticator, PasswordAuthentication} +import org.apache.hadoop.io.Text +import org.apache.hadoop.security.Credentials +import org.apache.hadoop.security.UserGroupInformation +import org.apache.spark.deploy.SparkHadoopUtil + +import scala.collection.mutable.ArrayBuffer + +/** + * Spark class responsible for security. + * + * In general this class should be instantiated by the SparkEnv and most components + * should access it from that. There are some cases where the SparkEnv hasn't been + * initialized yet and this class must be instantiated directly. + * + * Spark currently supports authentication via a shared secret. + * Authentication can be configured to be on via the 'spark.authenticate' configuration + * parameter. This parameter controls whether the Spark communication protocols do + * authentication using the shared secret. This authentication is a basic handshake to + * make sure both sides have the same shared secret and are allowed to communicate. + * If the shared secret is not identical they will not be allowed to communicate. + * + * The Spark UI can also be secured by using javax servlet filters. A user may want to + * secure the UI if it has data that other users should not be allowed to see. The javax + * servlet filter specified by the user can authenticate the user and then once the user + * is logged in, Spark can compare that user versus the view acls to make sure they are + * authorized to view the UI. The configs 'spark.ui.acls.enable' and 'spark.ui.view.acls' + * control the behavior of the acls. Note that the person who started the application + * always has view access to the UI. + * + * Spark does not currently support encryption after authentication. + * + * At this point spark has multiple communication protocols that need to be secured and + * different underlying mechisms are used depending on the protocol: + * + * - Akka -> The only option here is to use the Akka Remote secure-cookie functionality. + *Akka remoting allows you to specify a secure cookie that will be exchanged + *and ensured to be identical in the connection handshake between the client + *and the server. If they are not identical then the client will be refused + *to connect to the server. There is no control of the underlying + *authentication mechanism so its not clear if the password is passed in + *plaintext or uses DIGEST-MD5 or some other mechanism. + *Akka also has an option to turn on SSL, this option is not currently supported + *but we could add a configuration option in the future. + * + * - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty + *for the HttpServer. Jetty supports multiple authentication mechanisms - + *Basic, Digest, Form, Spengo, etc. It also supports multiple different login + *services - Hash, JAAS, Spnego, JDBC, etc. Spark currently uses the HashLoginService + *to authenticate using DIGEST-MD5 via a single user and the shared secret. + *Since we are using DIGEST-MD5, the shared secret is not passed on the wire + *in plaintext. + *We currently do not support SSL (https), but Jetty can be configured to use it + *so we could add a configuration option for this in the future. + * + *The Spark HttpServer installs the HashLoginServer and configures it t
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10216010 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -135,6 +135,8 @@ class SparkContext( val isLocal = (master == "local" || master.startsWith("local[")) + if (master == "yarn-client") System.setProperty("SPARK_YARN_MODE", "true") --- End diff -- Not sure off hand. I did want to change the confs to use sparkConf. I was going to wait until the sparkConf stuff was working again (reading files, etc) and my other thought was to possibly do it after this goes in just so this doesn't drag on. But I can investigate now if you prefer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10216556 --- Diff: core/src/main/scala/org/apache/spark/network/Connection.scala --- @@ -18,25 +18,27 @@ package org.apache.spark.network import org.apache.spark._ +import org.apache.spark.SparkSaslServer import scala.collection.mutable.{HashMap, Queue, ArrayBuffer} -import java.io._ import java.nio._ import java.nio.channels._ -import java.nio.channels.spi._ import java.net._ private[spark] abstract class Connection(val channel: SocketChannel, val selector: Selector, -val socketRemoteConnectionManagerId: ConnectionManagerId) +val socketRemoteConnectionManagerId: ConnectionManagerId, val connectionId: ConnectionId) extends Logging { - def this(channel_ : SocketChannel, selector_ : Selector) = { + var sparkSaslServer : SparkSaslServer = null --- End diff -- I'll fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10216724 --- Diff: core/src/main/scala/org/apache/spark/network/ConnectionManager.scala --- @@ -557,7 +754,54 @@ private[spark] class ConnectionManager(port: Int, conf: SparkConf) extends Loggi // useful in our test-env ... If we do re-add it, we should consistently use it everywhere I // guess ? val connection = connectionsById.getOrElseUpdate(connectionManagerId, startNewConnection()) +if (authEnabled) { + checkSendAuthFirst(connectionManagerId, connection) +} message.senderAddress = id.toSocketAddress() +logDebug("Before Sending [" + message + "] to [" + connectionManagerId + "]" + " " + --- End diff -- Its O(messages sent).. changed to trace. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10220939 --- Diff: core/src/main/scala/org/apache/spark/network/SecurityMessage.scala --- @@ -0,0 +1,110 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network + +import java.nio.ByteBuffer + +import scala.collection.mutable.ArrayBuffer +import scala.collection.mutable.StringBuilder + +import org.apache.spark._ +import org.apache.spark.network._ + +private[spark] class SecurityMessage() extends Logging { --- End diff -- Added a bunch of documentation. Let me know if its not enough or perhaps some of it should be moved --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10221207 --- Diff: core/src/main/scala/org/apache/spark/network/Connection.scala --- @@ -18,25 +18,27 @@ package org.apache.spark.network import org.apache.spark._ +import org.apache.spark.SparkSaslServer import scala.collection.mutable.{HashMap, Queue, ArrayBuffer} -import java.io._ import java.nio._ import java.nio.channels._ -import java.nio.channels.spi._ import java.net._ private[spark] abstract class Connection(val channel: SocketChannel, val selector: Selector, -val socketRemoteConnectionManagerId: ConnectionManagerId) +val socketRemoteConnectionManagerId: ConnectionManagerId, val connectionId: ConnectionId) extends Logging { - def this(channel_ : SocketChannel, selector_ : Selector) = { + var sparkSaslServer : SparkSaslServer = null --- End diff -- Note the compile complains if removing the space from the this declaration. Is there some other syntax to fix this? [ERROR] core/src/main/scala/org/apache/spark/network/Connection.scala:38: ':' expected but id entifier found. [ERROR] def this(channel_: SocketChannel, selector_ : Selector, id_ : ConnectionId) = { --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10221836 --- Diff: docs/configuration.md --- @@ -477,6 +505,21 @@ Apart from these, the following properties are also available, and may be useful Whether to overwrite files added through SparkContext.addFile() when the target file exists and its contents do not match those of the source. + + spark.authenticate + false + +Whether spark authenticates its internal connections. See SPARK_SECRET if not +running on Yarn. + + + + spark.core.connection.auth.wait.timeout --- End diff -- I will change it to seconds. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36542188 Thanks for the detailed review Patrick. I've updated based on the comments except for renaming the Handlers to Servlet and changing to use SparkConf. I will make those changes shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r1063 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark + +import java.net.{Authenticator, PasswordAuthentication} +import org.apache.hadoop.io.Text +import org.apache.hadoop.security.Credentials +import org.apache.hadoop.security.UserGroupInformation +import org.apache.spark.deploy.SparkHadoopUtil + +import scala.collection.mutable.ArrayBuffer + +/** + * Spark class responsible for security. + * + * In general this class should be instantiated by the SparkEnv and most components + * should access it from that. There are some cases where the SparkEnv hasn't been + * initialized yet and this class must be instantiated directly. + * + * Spark currently supports authentication via a shared secret. + * Authentication can be configured to be on via the 'spark.authenticate' configuration + * parameter. This parameter controls whether the Spark communication protocols do + * authentication using the shared secret. This authentication is a basic handshake to + * make sure both sides have the same shared secret and are allowed to communicate. + * If the shared secret is not identical they will not be allowed to communicate. + * + * The Spark UI can also be secured by using javax servlet filters. A user may want to + * secure the UI if it has data that other users should not be allowed to see. The javax + * servlet filter specified by the user can authenticate the user and then once the user + * is logged in, Spark can compare that user versus the view acls to make sure they are + * authorized to view the UI. The configs 'spark.ui.acls.enable' and 'spark.ui.view.acls' + * control the behavior of the acls. Note that the person who started the application + * always has view access to the UI. + * + * Spark does not currently support encryption after authentication. + * + * At this point spark has multiple communication protocols that need to be secured and + * different underlying mechisms are used depending on the protocol: + * + * - Akka -> The only option here is to use the Akka Remote secure-cookie functionality. + *Akka remoting allows you to specify a secure cookie that will be exchanged + *and ensured to be identical in the connection handshake between the client + *and the server. If they are not identical then the client will be refused + *to connect to the server. There is no control of the underlying + *authentication mechanism so its not clear if the password is passed in + *plaintext or uses DIGEST-MD5 or some other mechanism. + *Akka also has an option to turn on SSL, this option is not currently supported + *but we could add a configuration option in the future. + * + * - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty + *for the HttpServer. Jetty supports multiple authentication mechanisms - + *Basic, Digest, Form, Spengo, etc. It also supports multiple different login + *services - Hash, JAAS, Spnego, JDBC, etc. Spark currently uses the HashLoginService + *to authenticate using DIGEST-MD5 via a single user and the shared secret. + *Since we are using DIGEST-MD5, the shared secret is not passed on the wire + *in plaintext. + *We currently do not support SSL (https), but Jetty can be configured to use it + *so we could add a configuration option for this in the future. + * + *The Spark HttpServer installs the HashLoginServer and configures it t
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r1084 --- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala --- @@ -41,56 +46,103 @@ private[spark] object JettyUtils extends Logging { type Responder[T] = HttpServletRequest => T // Conversions from various types of Responder's to jetty Handlers - implicit def jsonResponderToHandler(responder: Responder[JValue]): Handler = + implicit def jsonResponderToHandler(responder: Responder[JValue]): HttpServlet = --- End diff -- Yep, renamed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36764964 yes I haven't made that change yet. I should have a patch up in a few hours with that changed. I'm just doing some finally testing on it. I'm changing it to pass the securityManager around (like in the servlets) and I'm changing it to read from SparkConf. I am only changing configs added by this PR. Note that some configs can't be set in sparkConf as they aren't really shared with the SparkContext. Examples of this are the Yarn ApplicationMaster. It can only get the SparkContext after its been instantiated which is to late for many things. I think for things like SPARK_YARN_MODE we should file a separate jira to handle since this really should have been done when sparkConf was added originally, as long as you are ok with that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36781865 filed https://spark-project.atlassian.net/browse/SPARK-1191 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1126. spark-app preliminary
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/86#discussion_r10331680 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala --- @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.deploy + +import java.io.BufferedReader +import java.io.InputStream +import java.io.InputStreamReader +import java.io.PrintStream + +import scala.collection.mutable.HashMap +import scala.collection.mutable.Map +import scala.collection.mutable.ArrayBuffer +import scala.collection.JavaConverters._ + +object SparkApp { + val CLIENT = 1 + val CLUSTER = 2 + val YARN = 1 + val STANDALONE = 2 + val MESOS = 4 + val LOCAL = 8 + val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL + + var clusterManager: Int = LOCAL + + def main(args: Array[String]) { +println("args: " + args.toList) +val appArgs = new SparkAppArguments(args) + +if (appArgs.master != null) { + if (appArgs.master.startsWith("yarn")) { --- End diff -- I agree with you the yarn-standalone name is a bit confusing. Its not used in to many places so if we can keep it for backwards compatibility but move to the new one that would be preferable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36825493 Pass securityManager and SparkConf around where we can. Note the changes to JettyUtils and calling functions. Switch to use SparkConf for getting and setting configs where I could. Also added ConnectionManagerSuite unit tests. I also upmerged to the latest master branch. Unfortunately the upmerge to master has broken passing credentials on yarn. It looks like the changes made in the runAsUser aren't transferring the credentials properly so I need to look at that further. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36826678 @pwendell Ok this should now be ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/94 SPARK-1195: set map_input_file environment variable in PipedRDD Hadoop uses the config mapreduce.map.input.file to indicate the input filename to the map when the input split is of type FileSplit. Some of the hadoop input and output formats set or use this config. This config can also be used by user code. PipedRDD runs an external process and the configs aren't available to that process. Hadoop Streaming does something very similar and the way they make configs available is exporting them into the environment replacing '.' with '_'. Spark should also export this variable when launching the pipe command so the user code has access to that config. Note that the config mapreduce.map.input.file is the new one, the old one which is deprecated but not yet removed is map.input.file. So we should handle both. Perhaps it would be better to abstract this out somehow so it goes into the HadoopParition code? You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgravescs/spark map_input_file Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/94.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #94 commit 2ba805ee7af681dce75d203816222779ccff45f2 Author: Thomas Graves Date: 2014-03-06T20:12:26Z set map_input_file environment variable in PipedRDD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/95#discussion_r10361302 --- Diff: docs/running-on-yarn.md --- @@ -82,35 +84,30 @@ For example: ./bin/spark-class org.apache.spark.deploy.yarn.Client \ --jar examples/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar \ --class org.apache.spark.examples.SparkPi \ - --args yarn-standalone \ + --args yarn-cluster \ --num-workers 3 \ --master-memory 4g \ --worker-memory 2g \ --worker-cores 1 -# Examine the output (replace $YARN_APP_ID in the following with the "application identifier" output by the previous command) -# (Note: YARN_APP_LOGS_DIR is usually /tmp/logs or $HADOOP_HOME/logs/userlogs depending on the Hadoop version.) -$ cat $YARN_APP_LOGS_DIR/$YARN_APP_ID/container*_01/stdout -Pi is roughly 3.13794 - -The above starts a YARN Client programs which start the default Application Master. Then SparkPi will be run as a child thread of Application Master, YARN Client will periodically polls the Application Master for status updates and displays them in the console. The client will exit once your application has finished running. +The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. --- End diff -- It might be nice to state that the output goes to the log files then they can see the section you added below for more details. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/95#issuecomment-36938497 Looks good to me (with the doc fixes commented on). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/33#discussion_r10361820 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -135,6 +135,8 @@ class SparkContext( val isLocal = (master == "local" || master.startsWith("local[")) + if (master == "yarn-client") System.setProperty("SPARK_YARN_MODE", "true") --- End diff -- Normally they shouldn't have to. If the user does certain things in their code (access SparkHadoopUtil before creating the SparkContext) and are using yarn-client mode they might have to. I pointed that out in the jira I filed. I think we should see if we can fix that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/94#discussion_r10365444 --- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala --- @@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with SharedSparkContext { } } + test("test pipe exports map_input_file") { +testExportInputFile("map_input_file") + } + + test("test pipe exports mapreduce_map_input_file") { +testExportInputFile("mapreduce_map_input_file") + } + + def testExportInputFile(varName:String) { +val nums = new HadoopRDD(sc, new JobConf(), classOf[TextInputFormat], classOf[LongWritable], --- End diff -- no reason I just was trying to avoid temporary file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/94#issuecomment-36948463 Adding a routine to HadoopPartition sounds good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/94#discussion_r10368475 --- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala --- @@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with SharedSparkContext { } } + test("test pipe exports map_input_file") { +testExportInputFile("map_input_file") + } + + test("test pipe exports mapreduce_map_input_file") { +testExportInputFile("mapreduce_map_input_file") + } + + def testExportInputFile(varName:String) { +val nums = new HadoopRDD(sc, new JobConf(), classOf[TextInputFormat], classOf[LongWritable], +classOf[Text], 2) { + override def getPartitions: Array[Partition] = Array(generateFakeHadoopPartition()) + override val getDependencies = List[Dependency[_]]() + override def compute(theSplit: Partition, context: TaskContext) = { +new InterruptibleIterator[(LongWritable, Text)](context, Iterator((new LongWritable(1), + new Text("b" + } +} +val hadoopPart1 = generateFakeHadoopPartition() +val pipedRdd = new PipedRDD(nums, "printenv " + varName) --- End diff -- Is there a convention or perhaps utilities for doing this already? I didn't see one doing quick look but I might have missed it. Note I copied the printenv command from a test above so I'll change both. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36955086 Nope nothing else to address in this. I'll merge it shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36955323 I committed this, thanks for all the reviews Patrick! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/94#discussion_r10384191 --- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala --- @@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with SharedSparkContext { } } + test("test pipe exports map_input_file") { +testExportInputFile("map_input_file") + } + + test("test pipe exports mapreduce_map_input_file") { +testExportInputFile("mapreduce_map_input_file") + } + + def testExportInputFile(varName:String) { +val nums = new HadoopRDD(sc, new JobConf(), classOf[TextInputFormat], classOf[LongWritable], +classOf[Text], 2) { + override def getPartitions: Array[Partition] = Array(generateFakeHadoopPartition()) + override val getDependencies = List[Dependency[_]]() + override def compute(theSplit: Partition, context: TaskContext) = { +new InterruptibleIterator[(LongWritable, Text)](context, Iterator((new LongWritable(1), + new Text("b" + } +} +val hadoopPart1 = generateFakeHadoopPartition() +val pipedRdd = new PipedRDD(nums, "printenv " + varName) --- End diff -- I'm definitely fine with fixing it, I was just wondering if we had some generic utilities that perhaps handled it for various platforms. Or perhaps a class that handled calling correct function depending on OS. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Upgrade Jetty to 9.1.3.v20140225.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37183820 Is there a particular reason for the upgrade (bug, feature, etc)? One reason I ask is changing the version can affect users. Can you create a jira for this also? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37230471 I generally use mvn to build. Right now I seem to be having issues building at all due to what appears like the cloudera repository being down. Error resolving project artifact: Could not transfer artifact org.eclipse.paho:mqtt-client:pom:0.4.0 from/to cloudera-repo (https://repository.cloudera.com/artifactory/cloudera-repos): Connection to https://repository.cloudera.com refused for project org.eclipse.paho:mqtt-client:jar:0.4.0: Connection timed out --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37332343 What exactly do you want tried? I built it for hadoop2 with sbt and ran a couple of the examples on yarn SparkPi and SparkHdfsLR. Those worked fine and the UI still worked. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/113#issuecomment-37405632 Yes the build failed for me with the error I put above. Note that this pr would need to have the maven build updated too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/120#discussion_r10516035 --- Diff: docs/running-on-yarn.md --- @@ -60,11 +60,11 @@ The command to launch the Spark application on the cluster is as follows: --jar \ --class \ --args \ - --num-workers \ - --master-class - --master-memory \ - --worker-memory \ - --worker-cores \ + --num-executors \ + --am-class --- End diff -- The master-class/am-class is used by the yarn-client mode. It launches a different AM then yarn-cluster mode. We should just remove the option from the documentation since end users shouldn't be using it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/120#issuecomment-37411628 I'm getting a compile error building this against hadoop 0.23: [ERROR] yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala:231: value getnumExecutorsRunning is not a member of org.apache.spark.deploy.yarn.YarnAllocationHandler [ERROR] val missingExecutorCount = args.numExecutors - yarnAllocator.getnumExecutorsRunning --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/120#issuecomment-37415976 I fixed the above compile error and tried to run but the executors return the following error: Unknown/unsupported param List(--num-executor, 2) Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] Options: --jar JAR_PATH Path to your application's JAR file (required) --class CLASS_NAME Name of your application's main class (required) --args ARGS Arguments to be passed to your application's main class. Mutliple invocations are possible, each will be passed in order. --num-executors NUMNumber of executors to start (Default: 2) --executor-cores NUM Number of cores for the executors (Default: 1) --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1193. Fix indentation in pom.xmls
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/91#discussion_r10521879 --- Diff: core/pom.xml --- @@ -17,274 +17,260 @@ --> http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd";> -4.0.0 - -org.apache.spark -spark-parent -1.0.0-SNAPSHOT -../pom.xml - - + 4.0.0 + org.apache.spark -spark-core_2.10 -jar -Spark Project Core -http://spark.apache.org/ +spark-parent +1.0.0-SNAPSHOT +../pom.xml + - - - -yarn-alpha - - - org.apache.avro - avro - - - - --- End diff -- @sryza was this intentionally removed? This breaks the hadoop 0.23 build so we need to add it back in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1232] Fix the hadoop 0.23 yarn build
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/127 [SPARK-1232] Fix the hadoop 0.23 yarn build You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgravescs/spark SPARK-1232 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/127.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #127 commit c05cfd4ad4d802378ab6d4b1dbaca494fe4ebd0a Author: Thomas Graves Date: 2014-03-12T15:41:59Z Fix the hadoop 0.23 yarn build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1198] Allow pipes tasks to run in diffe...
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/128 [SPARK-1198] Allow pipes tasks to run in different sub-directories This works as is on Linux/Mac/etc but doesn't cover working on Windows. In here I use ln -sf for symlinks. Putting this up for comments on that. Do we want to create perhaps some classes for doing shell commands - Linux vs Windows. Is there some other way we want to do this? I assume we are still supporting jdk1.6? Also should I update the Java API for pipes to allow this parameter? You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgravescs/spark SPARK1198 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/128.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #128 commit 1ab49ca90b7cae82efa26e018d9d285c948bf25c Author: Thomas Graves Date: 2014-03-12T14:11:46Z Add support for running pipe tasks is separate directories commit 6b783bdb5e09b7c96cbb76111876fbb6c9ca9a6f Author: Thomas Graves Date: 2014-03-12T14:47:13Z style fixes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1233] Fix running hadoop 0.23 due to ja...
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/129 [SPARK-1233] Fix running hadoop 0.23 due to java.lang.NoSuchFieldException: DEFAULT_M... ...APREDUCE_APPLICATION_CLASSPATH You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgravescs/spark SPARK-1233 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit 85ff5a67aa1bccf0b9915b43960091a20d3aa021 Author: Thomas Graves Date: 2014-03-12T16:35:51Z Fix running hadoop 0.23 due to java.lang.NoSuchFieldException: DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1198] Allow pipes tasks to run in diffe...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/128#issuecomment-37459180 jenkins failure seem unrelated to this change. Can someone kick it again perhaps? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1233] Fix running hadoop 0.23 due to ja...
Github user tgravescs closed the pull request at: https://github.com/apache/spark/pull/129 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1198] Allow pipes tasks to run in diffe...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/128#discussion_r10563706 --- Diff: core/pom.xml --- @@ -184,13 +184,12 @@ metrics-graphite - org.apache.derby - derby - test - - commons-io commons-io + + + org.apache.derby + derby --- End diff -- Yes, the diff is just a messy. I made commons-io available outside of the test scope since I use FileUtils.delete and moved it up a couple lines. derby didn't change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/120#discussion_r10566868 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala --- @@ -133,11 +148,11 @@ class ClientArguments(val args: Array[String], val sparkConf: SparkConf) { " --class CLASS_NAME Name of your application's main class (required)\n" + " --args ARGSArguments to be passed to your application's main class.\n" + " Mutliple invocations are possible, each will be passed in order.\n" + - " --num-workers NUM Number of workers to start (Default: 2)\n" + - " --worker-cores NUM Number of cores for the workers (Default: 1).\n" + - " --master-class CLASS_NAME Class Name for Master (Default: spark.deploy.yarn.ApplicationMaster)\n" + - " --master-memory MEMMemory for Master (e.g. 1000M, 2G) (Default: 512 Mb)\n" + - " --worker-memory MEMMemory per Worker (e.g. 1000M, 2G) (Default: 1G)\n" + + " --num-executors NUMNumber of executors to start (Default: 2)\n" + + " --executor-cores NUM Number of cores for the executors (Default: 1).\n" + + " --am-class CLASS_NAME Class Name for application master (Default: spark.deploy.yarn.ApplicationMaster)\n" + --- End diff -- How about we just leave --am-class out of the usage statement since its not meant to be used by the end user? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/120#issuecomment-37543156 Looks good to me. I made the small comment about perhaps leaving the --am-class out of the usage but I'm ok either way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/33 Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets resubmit pull request. was https://github.com/apache/incubator-spark/pull/332. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgravescs/spark security-branch-0.9-with-client-rebase Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/33.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #33 commit f35176352b77d8a295601479879a4666501eabe8 Author: Thomas Graves Date: 2014-01-19T19:35:32Z Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets commit 5721c5ac83b62afb8e8201730e4fc6bc76556e5b Author: Thomas Graves Date: 2014-01-20T15:17:56Z update AkkaUtilsSuite test for the actorSelection changes, fix typos based on comments, and remove extra lines I missed in rebase from AkkaUtils commit 2d9e23e1739f52a858c34b7e85c7625b43ad3833 Author: Thomas Graves Date: 2014-02-18T16:38:54Z Merge remote-tracking branch 'upstream/master' into security-branch-0.9-with-client-rebase_rework Conflicts: core/src/main/scala/org/apache/spark/SparkEnv.scala core/src/main/scala/org/apache/spark/network/ConnectionManager.scala core/src/main/scala/org/apache/spark/ui/JettyUtils.scala repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala commit 6f7ddf38d3b3f3c367df4d0b9a6be3a0bc644e1d Author: Thomas Graves Date: 2014-02-22T16:37:03Z Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to spark.ui.acls.enable, and fix up various other things from review comments commit ed3d1c16cf9a0af6530d2c37e62fb9cdc92ddfcb Author: Thomas Graves Date: 2014-02-22T16:50:30Z Add security.md commit b514becd7a0173ebeb209c0436e3c2c9f2f40a64 Author: Thomas Graves Date: 2014-02-22T16:52:56Z Fix reference to config commit ecbfb65860e4fea722537802cf036f0b505d7da9 Author: Thomas Graves Date: 2014-02-22T17:37:41Z Fix spacing and formatting commit 50dd9f2438356117e749d4cbd8d0ea8c25746166 Author: Thomas Graves Date: 2014-02-22T18:35:34Z fix header in SecurityManager --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1051. On YARN, executors don't doAs subm...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/29#issuecomment-36253998 Thats really unfortunately that hadoop 1.x doesn't support it, as I would prefer to use the addCredentials since it also handles secrets. Our only option right now would be to add a separate function in SparkHadoopUtils and YarnSparkHadoopUtils. This would allow spark on yarn to use the addCredentials at least. Maybe its not critical right now. thoughts? Otherwise changes look good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1032. If Yarn app fails before registeri...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/28#issuecomment-36255394 I personally like the way 4 spaces looks too. The style guide isn't clear on what its supposed to be. I'll assume it falls under the 4 space rule similar to functions: https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Indentation @pwendell - comment on the what the indentation is supposed to be? If 4 perhaps we should update the guide. Otherwise looks good to me. Thanks Sandy! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/33#issuecomment-36275335 its ready for review. I believe I've addressed all the comments from the previous PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1051. On YARN, executors don't doAs subm...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/29#issuecomment-36286358 I believe secrets are mostly for users adding secrets for other services. secrets are also used by the MR2 framework. secret keys are supported in MR1 via Credentials, I believe they just added the convenience methods to UserGroupInformation in mr2 for dealing with them. It looks like MR1 handled credentials through the JobConf. So I guess if you don't mind, overriding it for Yarn at least to pass them along might be best. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1032. If Yarn app fails before registeri...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/28#issuecomment-36362791 I committed this. Thanks Sandy! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Update dev merge script to use spark.git inste...
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/47 Update dev merge script to use spark.git instead of incubator-spark You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgravescs/spark fix_merge_script Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/47.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #47 commit 8209ab152dc14f534585d1235b32f6d22d8952a5 Author: Thomas Graves Date: 2014-02-28T15:42:54Z Update dev merge script to use spark.git instead of incubator-spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1051. On YARN, executors don't doAs subm...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/29#issuecomment-36367636 +1. Looks Good. Thanks Sandy! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1051. On YARN, executors don't doAs subm...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/29#issuecomment-36368483 @sryza it looks like this is no longer mergeable due to other check ins. Can you please update it to the latest. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-1121 Only add avro if the build is for H...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/6#issuecomment-36370069 I disagree with this change going in and breaking the yarn hadoop0.23 build. If we are going to support maven and hadoop 0.23 you should be able to build it without manually looking though jira/pull request to figure out what to add back in to get it to build.I unfortunately didn't see it before it went in. Note I think we should revert this unless someone is going to fix it soon. If we aren't going to support maven it should be done globally, as far as I've seen that discussion hasn't finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---