[GitHub] spark pull request: SPARK-1121: Use 'avro' profile in Maven.

2014-02-28 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/49#issuecomment-36398886
  
I'll give it a try. Any reason we don't just tie this to the yarn-alpha 
profile?  Or does it not apply to the hadoop 2.0.2 type builds?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10215150
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.net.{Authenticator, PasswordAuthentication}
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.UserGroupInformation
+import org.apache.spark.deploy.SparkHadoopUtil
+
+import scala.collection.mutable.ArrayBuffer
+
+/** 
+ * Spark class responsible for security. 
+ * 
+ * In general this class should be instantiated by the SparkEnv and most 
components
+ * should access it from that. There are some cases where the SparkEnv 
hasn't been 
+ * initialized yet and this class must be instantiated directly.
+ * 
+ * Spark currently supports authentication via a shared secret.
+ * Authentication can be configured to be on via the 'spark.authenticate' 
configuration
+ * parameter. This parameter controls whether the Spark communication 
protocols do 
+ * authentication using the shared secret. This authentication is a basic 
handshake to
+ * make sure both sides have the same shared secret and are allowed to 
communicate.
+ * If the shared secret is not identical they will not be allowed to 
communicate. 
+ * 
+ * The Spark UI can also be secured by using javax servlet filters. A user 
may want to 
+ * secure the UI if it has data that other users should not be allowed to 
see. The javax 
+ * servlet filter specified by the user can authenticate the user and then 
once the user 
+ * is logged in, Spark can compare that user versus the view acls to make 
sure they are 
+ * authorized to view the UI. The configs 'spark.ui.acls.enable' and 
'spark.ui.view.acls' 
+ * control the behavior of the acls. Note that the person who started the 
application
+ * always has view access to the UI.
+ *
+ * Spark does not currently support encryption after authentication.
+ * 
+ * At this point spark has multiple communication protocols that need to 
be secured and
+ * different underlying mechisms are used depending on the protocol:
+ *
+ *  - Akka -> The only option here is to use the Akka Remote secure-cookie 
functionality. 
+ *Akka remoting allows you to specify a secure cookie that 
will be exchanged 
+ *and ensured to be identical in the connection handshake 
between the client 
+ *and the server. If they are not identical then the client 
will be refused 
+ *to connect to the server. There is no control of the 
underlying 
+ *authentication mechanism so its not clear if the password is 
passed in 
+ *plaintext or uses DIGEST-MD5 or some other mechanism.
+ *Akka also has an option to turn on SSL, this option is not 
currently supported
+ *but we could add a configuration option in the future.
+ * 
+ *  - HTTP for broadcast and file server (via HttpServer) ->  Spark 
currently uses Jetty 
+ *for the HttpServer. Jetty supports multiple authentication 
mechanisms - 
+ *Basic, Digest, Form, Spengo, etc. It also supports multiple 
different login 
+ *services - Hash, JAAS, Spnego, JDBC, etc.  Spark currently 
uses the HashLoginService
+ *to authenticate using DIGEST-MD5 via a single user and the 
shared secret. 
+ *Since we are using DIGEST-MD5, the shared secret is not 
passed on the wire
+ *in plaintext.
+ *We currently do not support SSL (https), but Jetty can be 
configured to use it
+ *so we could add a configuration option for this in the 
future.
+ *
+ *The Spark HttpServer installs the HashLoginServer and 
configures it t

[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10215322
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.net.{Authenticator, PasswordAuthentication}
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.UserGroupInformation
+import org.apache.spark.deploy.SparkHadoopUtil
+
+import scala.collection.mutable.ArrayBuffer
+
+/** 
+ * Spark class responsible for security. 
+ * 
+ * In general this class should be instantiated by the SparkEnv and most 
components
+ * should access it from that. There are some cases where the SparkEnv 
hasn't been 
+ * initialized yet and this class must be instantiated directly.
+ * 
+ * Spark currently supports authentication via a shared secret.
+ * Authentication can be configured to be on via the 'spark.authenticate' 
configuration
+ * parameter. This parameter controls whether the Spark communication 
protocols do 
+ * authentication using the shared secret. This authentication is a basic 
handshake to
+ * make sure both sides have the same shared secret and are allowed to 
communicate.
+ * If the shared secret is not identical they will not be allowed to 
communicate. 
+ * 
+ * The Spark UI can also be secured by using javax servlet filters. A user 
may want to 
+ * secure the UI if it has data that other users should not be allowed to 
see. The javax 
+ * servlet filter specified by the user can authenticate the user and then 
once the user 
+ * is logged in, Spark can compare that user versus the view acls to make 
sure they are 
+ * authorized to view the UI. The configs 'spark.ui.acls.enable' and 
'spark.ui.view.acls' 
+ * control the behavior of the acls. Note that the person who started the 
application
+ * always has view access to the UI.
+ *
+ * Spark does not currently support encryption after authentication.
+ * 
+ * At this point spark has multiple communication protocols that need to 
be secured and
+ * different underlying mechisms are used depending on the protocol:
+ *
+ *  - Akka -> The only option here is to use the Akka Remote secure-cookie 
functionality. 
+ *Akka remoting allows you to specify a secure cookie that 
will be exchanged 
+ *and ensured to be identical in the connection handshake 
between the client 
+ *and the server. If they are not identical then the client 
will be refused 
+ *to connect to the server. There is no control of the 
underlying 
+ *authentication mechanism so its not clear if the password is 
passed in 
+ *plaintext or uses DIGEST-MD5 or some other mechanism.
+ *Akka also has an option to turn on SSL, this option is not 
currently supported
+ *but we could add a configuration option in the future.
+ * 
+ *  - HTTP for broadcast and file server (via HttpServer) ->  Spark 
currently uses Jetty 
+ *for the HttpServer. Jetty supports multiple authentication 
mechanisms - 
+ *Basic, Digest, Form, Spengo, etc. It also supports multiple 
different login 
+ *services - Hash, JAAS, Spnego, JDBC, etc.  Spark currently 
uses the HashLoginService
+ *to authenticate using DIGEST-MD5 via a single user and the 
shared secret. 
+ *Since we are using DIGEST-MD5, the shared secret is not 
passed on the wire
+ *in plaintext.
+ *We currently do not support SSL (https), but Jetty can be 
configured to use it
+ *so we could add a configuration option for this in the 
future.
+ *
+ *The Spark HttpServer installs the HashLoginServer and 
configures it t

[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10216010
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -135,6 +135,8 @@ class SparkContext(
 
   val isLocal = (master == "local" || master.startsWith("local["))
 
+  if (master == "yarn-client") System.setProperty("SPARK_YARN_MODE", 
"true")
--- End diff --

Not sure off hand.  I did want to change the confs to use sparkConf.  I was 
going to wait until the sparkConf stuff was working again (reading files, etc) 
and my other thought was to possibly do it after this goes in just so this 
doesn't drag on.  But I can investigate now if you prefer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10216556
  
--- Diff: core/src/main/scala/org/apache/spark/network/Connection.scala ---
@@ -18,25 +18,27 @@
 package org.apache.spark.network
 
 import org.apache.spark._
+import org.apache.spark.SparkSaslServer
 
 import scala.collection.mutable.{HashMap, Queue, ArrayBuffer}
 
-import java.io._
 import java.nio._
 import java.nio.channels._
-import java.nio.channels.spi._
 import java.net._
 
 
 private[spark]
 abstract class Connection(val channel: SocketChannel, val selector: 
Selector,
-val socketRemoteConnectionManagerId: ConnectionManagerId)
+val socketRemoteConnectionManagerId: ConnectionManagerId, val 
connectionId: ConnectionId)
   extends Logging {
 
-  def this(channel_ : SocketChannel, selector_ : Selector) = {
+  var sparkSaslServer : SparkSaslServer = null
--- End diff --

I'll fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10216724
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/ConnectionManager.scala ---
@@ -557,7 +754,54 @@ private[spark] class ConnectionManager(port: Int, 
conf: SparkConf) extends Loggi
 // useful in our test-env ... If we do re-add it, we should 
consistently use it everywhere I
 // guess ?
 val connection = connectionsById.getOrElseUpdate(connectionManagerId, 
startNewConnection())
+if (authEnabled) {
+  checkSendAuthFirst(connectionManagerId, connection)
+}
 message.senderAddress = id.toSocketAddress()
+logDebug("Before Sending [" + message + "] to [" + connectionManagerId 
+ "]" + " " +
--- End diff --

Its O(messages sent).. changed to trace.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10220939
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/SecurityMessage.scala ---
@@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network
+
+import java.nio.ByteBuffer
+
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable.StringBuilder
+
+import org.apache.spark._
+import org.apache.spark.network._
+
+private[spark] class SecurityMessage() extends Logging {
--- End diff --

Added a bunch of documentation.  Let me know if its not enough or perhaps 
some of it should be moved


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10221207
  
--- Diff: core/src/main/scala/org/apache/spark/network/Connection.scala ---
@@ -18,25 +18,27 @@
 package org.apache.spark.network
 
 import org.apache.spark._
+import org.apache.spark.SparkSaslServer
 
 import scala.collection.mutable.{HashMap, Queue, ArrayBuffer}
 
-import java.io._
 import java.nio._
 import java.nio.channels._
-import java.nio.channels.spi._
 import java.net._
 
 
 private[spark]
 abstract class Connection(val channel: SocketChannel, val selector: 
Selector,
-val socketRemoteConnectionManagerId: ConnectionManagerId)
+val socketRemoteConnectionManagerId: ConnectionManagerId, val 
connectionId: ConnectionId)
   extends Logging {
 
-  def this(channel_ : SocketChannel, selector_ : Selector) = {
+  var sparkSaslServer : SparkSaslServer = null
--- End diff --

Note the compile complains if removing the space from the this declaration. 
Is there some other syntax to fix this?

[ERROR] core/src/main/scala/org/apache/spark/network/Connection.scala:38: 
':' expected but id
entifier found. 
   
[ERROR]   def this(channel_: SocketChannel, selector_ : Selector, id_ : 
ConnectionId) = {


  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10221836
  
--- Diff: docs/configuration.md ---
@@ -477,6 +505,21 @@ Apart from these, the following properties are also 
available, and may be useful
   
 Whether to overwrite files added through SparkContext.addFile() when 
the target file exists and its contents do not match those of the source.
   
+  
+  spark.authenticate
+  false
+  
+Whether spark authenticates its internal connections. See 
SPARK_SECRET if not
+running on Yarn.
+  
+
+  
+  spark.core.connection.auth.wait.timeout
--- End diff --

I will change it to seconds.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/33#issuecomment-36542188
  
Thanks for the detailed review Patrick.  I've updated based on the comments 
except for renaming the Handlers to Servlet and changing to use SparkConf.   I 
will make those changes shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r1063
  
--- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.net.{Authenticator, PasswordAuthentication}
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.Credentials
+import org.apache.hadoop.security.UserGroupInformation
+import org.apache.spark.deploy.SparkHadoopUtil
+
+import scala.collection.mutable.ArrayBuffer
+
+/** 
+ * Spark class responsible for security. 
+ * 
+ * In general this class should be instantiated by the SparkEnv and most 
components
+ * should access it from that. There are some cases where the SparkEnv 
hasn't been 
+ * initialized yet and this class must be instantiated directly.
+ * 
+ * Spark currently supports authentication via a shared secret.
+ * Authentication can be configured to be on via the 'spark.authenticate' 
configuration
+ * parameter. This parameter controls whether the Spark communication 
protocols do 
+ * authentication using the shared secret. This authentication is a basic 
handshake to
+ * make sure both sides have the same shared secret and are allowed to 
communicate.
+ * If the shared secret is not identical they will not be allowed to 
communicate. 
+ * 
+ * The Spark UI can also be secured by using javax servlet filters. A user 
may want to 
+ * secure the UI if it has data that other users should not be allowed to 
see. The javax 
+ * servlet filter specified by the user can authenticate the user and then 
once the user 
+ * is logged in, Spark can compare that user versus the view acls to make 
sure they are 
+ * authorized to view the UI. The configs 'spark.ui.acls.enable' and 
'spark.ui.view.acls' 
+ * control the behavior of the acls. Note that the person who started the 
application
+ * always has view access to the UI.
+ *
+ * Spark does not currently support encryption after authentication.
+ * 
+ * At this point spark has multiple communication protocols that need to 
be secured and
+ * different underlying mechisms are used depending on the protocol:
+ *
+ *  - Akka -> The only option here is to use the Akka Remote secure-cookie 
functionality. 
+ *Akka remoting allows you to specify a secure cookie that 
will be exchanged 
+ *and ensured to be identical in the connection handshake 
between the client 
+ *and the server. If they are not identical then the client 
will be refused 
+ *to connect to the server. There is no control of the 
underlying 
+ *authentication mechanism so its not clear if the password is 
passed in 
+ *plaintext or uses DIGEST-MD5 or some other mechanism.
+ *Akka also has an option to turn on SSL, this option is not 
currently supported
+ *but we could add a configuration option in the future.
+ * 
+ *  - HTTP for broadcast and file server (via HttpServer) ->  Spark 
currently uses Jetty 
+ *for the HttpServer. Jetty supports multiple authentication 
mechanisms - 
+ *Basic, Digest, Form, Spengo, etc. It also supports multiple 
different login 
+ *services - Hash, JAAS, Spnego, JDBC, etc.  Spark currently 
uses the HashLoginService
+ *to authenticate using DIGEST-MD5 via a single user and the 
shared secret. 
+ *Since we are using DIGEST-MD5, the shared secret is not 
passed on the wire
+ *in plaintext.
+ *We currently do not support SSL (https), but Jetty can be 
configured to use it
+ *so we could add a configuration option for this in the 
future.
+ *
+ *The Spark HttpServer installs the HashLoginServer and 
configures it t

[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-03-03 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r1084
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -41,56 +46,103 @@ private[spark] object JettyUtils extends Logging {
   type Responder[T] = HttpServletRequest => T
 
   // Conversions from various types of Responder's to jetty Handlers
-  implicit def jsonResponderToHandler(responder: Responder[JValue]): 
Handler =
+  implicit def jsonResponderToHandler(responder: Responder[JValue]): 
HttpServlet =
--- End diff --

Yep, renamed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/33#issuecomment-36764964
  
yes I haven't made that change yet. I should have a patch up in a few hours 
with that changed.  I'm just doing some finally testing on it. I'm changing it 
to pass the securityManager around (like in the servlets) and I'm changing it 
to read from SparkConf.  I am only changing configs added by this PR.  

Note that some configs can't be set in sparkConf as they aren't really 
shared with the SparkContext. Examples of this are the Yarn ApplicationMaster.  
It can only get the SparkContext after its been instantiated which is to late 
for many things.  

I think for things like SPARK_YARN_MODE we should file a separate jira to 
handle since this really should have been done when sparkConf was added 
originally,  as long as you are ok with that?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/33#issuecomment-36781865
  
filed https://spark-project.atlassian.net/browse/SPARK-1191


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1126. spark-app preliminary

2014-03-05 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/86#discussion_r10331680
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkApp.scala ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+import java.io.BufferedReader
+import java.io.InputStream
+import java.io.InputStreamReader
+import java.io.PrintStream
+
+import scala.collection.mutable.HashMap
+import scala.collection.mutable.Map
+import scala.collection.mutable.ArrayBuffer
+import scala.collection.JavaConverters._
+
+object SparkApp {
+  val CLIENT = 1
+  val CLUSTER = 2
+  val YARN = 1
+  val STANDALONE = 2
+  val MESOS = 4
+  val LOCAL = 8
+  val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
+
+  var clusterManager: Int = LOCAL
+
+  def main(args: Array[String]) {
+println("args: " + args.toList)
+val appArgs = new SparkAppArguments(args)
+
+if (appArgs.master != null) {
+  if (appArgs.master.startsWith("yarn")) {
--- End diff --

I agree with you the yarn-standalone name is a bit confusing. Its not used 
in to many places so if we can keep it for backwards compatibility but move to 
the new one that would be preferable.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/33#issuecomment-36825493
  
Pass securityManager and SparkConf around where we can.  Note the changes 
to JettyUtils and calling functions. Switch to use SparkConf for getting and 
setting configs where I could.  Also added ConnectionManagerSuite unit tests.

I also upmerged to the latest master branch.

Unfortunately the upmerge to master has broken passing credentials on yarn. 
 It looks like the changes made in the runAsUser aren't transferring the 
credentials properly so I need to look at that further.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-05 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/33#issuecomment-36826678
  
@pwendell   Ok this should now be ready for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/94

SPARK-1195: set map_input_file environment variable in PipedRDD

Hadoop uses the config mapreduce.map.input.file to indicate the input 
filename to the map when the input split is of type FileSplit. Some of the 
hadoop input and output formats set or use this config. This config can also be 
used by user code.
PipedRDD runs an external process and the configs aren't available to that 
process. Hadoop Streaming does something very similar and the way they make 
configs available is exporting them into the environment replacing '.' with 
'_'. Spark should also export this variable when launching the pipe command so 
the user code has access to that config.
Note that the config mapreduce.map.input.file is the new one, the old one 
which is deprecated but not yet removed is map.input.file. So we should handle 
both.

Perhaps it would be better to abstract this out somehow so it goes into the 
HadoopParition code?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark map_input_file

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/94.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #94


commit 2ba805ee7af681dce75d203816222779ccff45f2
Author: Thomas Graves 
Date:   2014-03-06T20:12:26Z

set map_input_file environment variable in PipedRDD




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...

2014-03-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/95#discussion_r10361302
  
--- Diff: docs/running-on-yarn.md ---
@@ -82,35 +84,30 @@ For example:
 ./bin/spark-class org.apache.spark.deploy.yarn.Client \
   --jar 
examples/target/scala-{{site.SCALA_BINARY_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar
 \
   --class org.apache.spark.examples.SparkPi \
-  --args yarn-standalone \
+  --args yarn-cluster \
   --num-workers 3 \
   --master-memory 4g \
   --worker-memory 2g \
   --worker-cores 1
 
-# Examine the output (replace $YARN_APP_ID in the following with the 
"application identifier" output by the previous command)
-# (Note: YARN_APP_LOGS_DIR is usually /tmp/logs or 
$HADOOP_HOME/logs/userlogs depending on the Hadoop version.)
-$ cat $YARN_APP_LOGS_DIR/$YARN_APP_ID/container*_01/stdout
-Pi is roughly 3.13794
-
-The above starts a YARN Client programs which start the default 
Application Master. Then SparkPi will be run as a child thread of Application 
Master, YARN Client will  periodically polls the Application Master for status 
updates and displays them in the console. The client will exit once your 
application has finished running.
+The above starts a YARN client program which starts the default 
Application Master. Then SparkPi will be run as a child thread of Application 
Master. The client will periodically poll the Application Master for status 
updates and display them in the console. The client will exit once your 
application has finished running.
--- End diff --

It might be nice to state that the output goes to the log files then they 
can see the section you added below for more details.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1197. Change yarn-standalone to yarn-clu...

2014-03-06 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/95#issuecomment-36938497
  
Looks good to me (with the doc fixes commented on).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/33#discussion_r10361820
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -135,6 +135,8 @@ class SparkContext(
 
   val isLocal = (master == "local" || master.startsWith("local["))
 
+  if (master == "yarn-client") System.setProperty("SPARK_YARN_MODE", 
"true")
--- End diff --

Normally they shouldn't have to.  If the user does certain things in their 
code (access SparkHadoopUtil before creating the SparkContext) and are using 
yarn-client mode they might have to.  I pointed that out in the jira I filed.  
I think we should see if we can fix that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/94#discussion_r10365444
  
--- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala ---
@@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with 
SharedSparkContext {
 }
   }
 
+  test("test pipe exports map_input_file") {
+testExportInputFile("map_input_file")
+  }
+
+  test("test pipe exports mapreduce_map_input_file") {
+testExportInputFile("mapreduce_map_input_file")
+  }
+
+  def testExportInputFile(varName:String) {
+val nums = new HadoopRDD(sc, new JobConf(), classOf[TextInputFormat], 
classOf[LongWritable],
--- End diff --

no reason I just was trying to avoid temporary file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/94#issuecomment-36948463
  
Adding a routine to HadoopPartition sounds good. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/94#discussion_r10368475
  
--- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala ---
@@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with 
SharedSparkContext {
 }
   }
 
+  test("test pipe exports map_input_file") {
+testExportInputFile("map_input_file")
+  }
+
+  test("test pipe exports mapreduce_map_input_file") {
+testExportInputFile("mapreduce_map_input_file")
+  }
+
+  def testExportInputFile(varName:String) {
+val nums = new HadoopRDD(sc, new JobConf(), classOf[TextInputFormat], 
classOf[LongWritable],
+classOf[Text], 2) {
+  override def getPartitions: Array[Partition] = 
Array(generateFakeHadoopPartition())
+  override val getDependencies = List[Dependency[_]]()
+  override def compute(theSplit: Partition, context: TaskContext) = {
+new InterruptibleIterator[(LongWritable, Text)](context, 
Iterator((new LongWritable(1),
+  new Text("b"
+  }
+}
+val hadoopPart1 = generateFakeHadoopPartition()
+val pipedRdd = new PipedRDD(nums, "printenv " + varName)
--- End diff --

Is there a convention or perhaps utilities for doing this already?  I 
didn't see one doing quick look but I might have missed it.  

Note I copied the printenv command from a test above so I'll change both. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-06 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/33#issuecomment-36955086
  
Nope nothing else to address in this. I'll merge it shortly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1189: Add Security to Spark - Akka, Http...

2014-03-06 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/33#issuecomment-36955323
  
I committed this, thanks for all the reviews Patrick!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1195: set map_input_file environment var...

2014-03-07 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/94#discussion_r10384191
  
--- Diff: core/src/test/scala/org/apache/spark/PipedRDDSuite.scala ---
@@ -89,4 +97,37 @@ class PipedRDDSuite extends FunSuite with 
SharedSparkContext {
 }
   }
 
+  test("test pipe exports map_input_file") {
+testExportInputFile("map_input_file")
+  }
+
+  test("test pipe exports mapreduce_map_input_file") {
+testExportInputFile("mapreduce_map_input_file")
+  }
+
+  def testExportInputFile(varName:String) {
+val nums = new HadoopRDD(sc, new JobConf(), classOf[TextInputFormat], 
classOf[LongWritable],
+classOf[Text], 2) {
+  override def getPartitions: Array[Partition] = 
Array(generateFakeHadoopPartition())
+  override val getDependencies = List[Dependency[_]]()
+  override def compute(theSplit: Partition, context: TaskContext) = {
+new InterruptibleIterator[(LongWritable, Text)](context, 
Iterator((new LongWritable(1),
+  new Text("b"
+  }
+}
+val hadoopPart1 = generateFakeHadoopPartition()
+val pipedRdd = new PipedRDD(nums, "printenv " + varName)
--- End diff --

I'm definitely fine with fixing it, I was just wondering if we had some 
generic utilities that perhaps handled it for various platforms. Or perhaps a 
class that handled calling correct function depending on OS. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Upgrade Jetty to 9.1.3.v20140225.

2014-03-10 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/113#issuecomment-37183820
  
Is there a particular reason for the upgrade (bug, feature, etc)?  One 
reason I ask is changing the version can affect users. Can you create a jira 
for this also?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.

2014-03-10 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/113#issuecomment-37230471
  
I generally use mvn to build.  Right now I seem to be having issues 
building at all due to what appears like the cloudera repository being down.

Error resolving project artifact: Could not transfer artifact 
org.eclipse.paho:mqtt-client:pom:0.4.0 from/to cloudera-repo 
(https://repository.cloudera.com/artifactory/cloudera-repos): Connection to 
https://repository.cloudera.com refused for project 
org.eclipse.paho:mqtt-client:jar:0.4.0: Connection timed out 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.

2014-03-11 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/113#issuecomment-37332343
  
What exactly do you want tried?  I built it for hadoop2 with sbt and ran a 
couple of the examples on yarn SparkPi and SparkHdfsLR.  Those worked fine and 
the UI still worked.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: WIP - Upgrade Jetty to 9.1.3.v20140225.

2014-03-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/113#issuecomment-37405632
  
Yes the build failed for me with the error I put above.

Note that this pr would need to have the maven build updated too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...

2014-03-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/120#discussion_r10516035
  
--- Diff: docs/running-on-yarn.md ---
@@ -60,11 +60,11 @@ The command to launch the Spark application on the 
cluster is as follows:
   --jar  \
   --class  \
   --args  \
-  --num-workers  \
-  --master-class 
-  --master-memory  \
-  --worker-memory  \
-  --worker-cores  \
+  --num-executors  \
+  --am-class 
--- End diff --

The master-class/am-class is used by the yarn-client mode.  It launches a 
different AM then yarn-cluster mode.  We should just remove the option from the 
documentation since end users shouldn't be using it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...

2014-03-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/120#issuecomment-37411628
  
 I'm getting a compile error building this against hadoop 0.23:

[ERROR] 
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala:231:
 value getnumExecutorsRunning is not a member of 
org.apache.spark.deploy.yarn.YarnAllocationHandler
[ERROR]   val missingExecutorCount = args.numExecutors - 
yarnAllocator.getnumExecutorsRunning



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...

2014-03-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/120#issuecomment-37415976
  
I fixed the above compile error and tried to run but the executors return 
the following error:

Unknown/unsupported param List(--num-executor, 2)
Usage: org.apache.spark.deploy.yarn.ApplicationMaster [options] 
Options:
  --jar JAR_PATH   Path to your application's JAR file (required)
  --class CLASS_NAME   Name of your application's main class (required)
  --args ARGS  Arguments to be passed to your application's main 
class.
   Mutliple invocations are possible, each will be 
passed in order.
  --num-executors NUMNumber of executors to start (Default: 2)
  --executor-cores NUM   Number of cores for the executors (Default: 1)
  --executor-memory MEM  Memory per executor (e.g. 1000M, 2G) (Default: 1G)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1193. Fix indentation in pom.xmls

2014-03-12 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/91#discussion_r10521879
  
--- Diff: core/pom.xml ---
@@ -17,274 +17,260 @@
   -->
 
 http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
-4.0.0
-
-org.apache.spark
-spark-parent
-1.0.0-SNAPSHOT
-../pom.xml
-
-
+  4.0.0
+  
 org.apache.spark
-spark-core_2.10
-jar
-Spark Project Core
-http://spark.apache.org/
+spark-parent
+1.0.0-SNAPSHOT
+../pom.xml
+  
 
-
-
-  
-yarn-alpha
-
-   
- org.apache.avro
- avro
-   
-
-  
-
--- End diff --

@sryza was this intentionally removed? This breaks the hadoop 0.23 build so 
we need to add it back in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1232] Fix the hadoop 0.23 yarn build

2014-03-12 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/127

[SPARK-1232] Fix the hadoop 0.23 yarn build



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK-1232

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #127


commit c05cfd4ad4d802378ab6d4b1dbaca494fe4ebd0a
Author: Thomas Graves 
Date:   2014-03-12T15:41:59Z

Fix the hadoop 0.23 yarn build




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1198] Allow pipes tasks to run in diffe...

2014-03-12 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/128

[SPARK-1198] Allow pipes tasks to run in different sub-directories

This works as is on Linux/Mac/etc but doesn't cover working on Windows.  In 
here I use ln -sf for symlinks. Putting this up for comments on that. Do we 
want to create perhaps some classes for doing shell commands - Linux vs 
Windows.  Is there some other way we want to do this?   I assume we are still 
supporting jdk1.6?

Also should I update the Java API for pipes to allow this parameter?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK1198

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/128.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #128


commit 1ab49ca90b7cae82efa26e018d9d285c948bf25c
Author: Thomas Graves 
Date:   2014-03-12T14:11:46Z

Add support for running pipe tasks is separate directories

commit 6b783bdb5e09b7c96cbb76111876fbb6c9ca9a6f
Author: Thomas Graves 
Date:   2014-03-12T14:47:13Z

style fixes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1233] Fix running hadoop 0.23 due to ja...

2014-03-12 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/129

[SPARK-1233] Fix running hadoop 0.23 due to java.lang.NoSuchFieldException: 
DEFAULT_M...

...APREDUCE_APPLICATION_CLASSPATH

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK-1233

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #129


commit 85ff5a67aa1bccf0b9915b43960091a20d3aa021
Author: Thomas Graves 
Date:   2014-03-12T16:35:51Z

Fix running hadoop 0.23 due to java.lang.NoSuchFieldException: 
DEFAULT_MAPREDUCE_APPLICATION_CLASSPATH




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1198] Allow pipes tasks to run in diffe...

2014-03-12 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/128#issuecomment-37459180
  
jenkins failure seem unrelated to this change.  Can someone kick it again 
perhaps?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1233] Fix running hadoop 0.23 due to ja...

2014-03-12 Thread tgravescs
Github user tgravescs closed the pull request at:

https://github.com/apache/spark/pull/129


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1198] Allow pipes tasks to run in diffe...

2014-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/128#discussion_r10563706
  
--- Diff: core/pom.xml ---
@@ -184,13 +184,12 @@
   metrics-graphite
 
 
-  org.apache.derby
-  derby
-  test
-
-
   commons-io
   commons-io
+
+
+  org.apache.derby
+  derby
--- End diff --

Yes, the diff is just a messy.  I made commons-io available outside of the 
test scope since I use FileUtils.delete and moved it up a couple lines. derby 
didn't change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...

2014-03-13 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/120#discussion_r10566868
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala 
---
@@ -133,11 +148,11 @@ class ClientArguments(val args: Array[String], val 
sparkConf: SparkConf) {
   "  --class CLASS_NAME Name of your application's main class 
(required)\n" +
   "  --args ARGSArguments to be passed to your 
application's main class.\n" +
   " Mutliple invocations are possible, 
each will be passed in order.\n" +
-  "  --num-workers NUM  Number of workers to start (Default: 
2)\n" +
-  "  --worker-cores NUM Number of cores for the workers 
(Default: 1).\n" +
-  "  --master-class CLASS_NAME  Class Name for Master (Default: 
spark.deploy.yarn.ApplicationMaster)\n" +
-  "  --master-memory MEMMemory for Master (e.g. 1000M, 2G) 
(Default: 512 Mb)\n" +
-  "  --worker-memory MEMMemory per Worker (e.g. 1000M, 2G) 
(Default: 1G)\n" +
+  "  --num-executors NUMNumber of executors to start (Default: 
2)\n" +
+  "  --executor-cores NUM   Number of cores for the executors 
(Default: 1).\n" +
+  "  --am-class CLASS_NAME  Class Name for application master 
(Default: spark.deploy.yarn.ApplicationMaster)\n" +
--- End diff --

How about we just leave --am-class out of the usage statement since its not 
meant to be used by the end user?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1183. Don't use "worker" to mean executo...

2014-03-13 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/120#issuecomment-37543156
  
Looks good to me.  I made the small comment about perhaps leaving the 
--am-class out of the usage but I'm ok either way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-02-27 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/33

Add Security to Spark - Akka, Http, ConnectionManager, UI use servlets

resubmit pull request.  was 
https://github.com/apache/incubator-spark/pull/332.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark 
security-branch-0.9-with-client-rebase

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/33.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #33


commit f35176352b77d8a295601479879a4666501eabe8
Author: Thomas Graves 
Date:   2014-01-19T19:35:32Z

Add Security to Spark - Akka, Http, ConnectionManager, UI to use servlets

commit 5721c5ac83b62afb8e8201730e4fc6bc76556e5b
Author: Thomas Graves 
Date:   2014-01-20T15:17:56Z

update AkkaUtilsSuite test for the actorSelection changes, fix typos based 
on comments, and remove extra lines I missed in rebase from AkkaUtils

commit 2d9e23e1739f52a858c34b7e85c7625b43ad3833
Author: Thomas Graves 
Date:   2014-02-18T16:38:54Z

Merge remote-tracking branch 'upstream/master' into 
security-branch-0.9-with-client-rebase_rework

Conflicts:
core/src/main/scala/org/apache/spark/SparkEnv.scala
core/src/main/scala/org/apache/spark/network/ConnectionManager.scala
core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala

commit 6f7ddf38d3b3f3c367df4d0b9a6be3a0bc644e1d
Author: Thomas Graves 
Date:   2014-02-22T16:37:03Z

Convert SaslClient and SaslServer to scala, change spark.authenticate.ui to 
spark.ui.acls.enable, and fix up various
other things from review comments

commit ed3d1c16cf9a0af6530d2c37e62fb9cdc92ddfcb
Author: Thomas Graves 
Date:   2014-02-22T16:50:30Z

Add security.md

commit b514becd7a0173ebeb209c0436e3c2c9f2f40a64
Author: Thomas Graves 
Date:   2014-02-22T16:52:56Z

Fix reference to config

commit ecbfb65860e4fea722537802cf036f0b505d7da9
Author: Thomas Graves 
Date:   2014-02-22T17:37:41Z

Fix spacing and formatting

commit 50dd9f2438356117e749d4cbd8d0ea8c25746166
Author: Thomas Graves 
Date:   2014-02-22T18:35:34Z

fix header in SecurityManager




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1051. On YARN, executors don't doAs subm...

2014-02-27 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/29#issuecomment-36253998
  
Thats really unfortunately that hadoop 1.x doesn't support it, as I would 
prefer to use the addCredentials since it also handles secrets.

Our only option right now would be to add a separate function in 
SparkHadoopUtils and YarnSparkHadoopUtils. This would allow spark on yarn to 
use the addCredentials at least.  Maybe its not critical right now. thoughts?

Otherwise changes look good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1032. If Yarn app fails before registeri...

2014-02-27 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/28#issuecomment-36255394
  
I personally like the way 4 spaces looks too.  The style guide isn't clear 
on what its supposed to be.  I'll assume it falls under the 4 space rule 
similar to functions: 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Indentation

@pwendell - comment on the what the indentation is supposed to be?  If 4 
perhaps we should update the guide.

Otherwise looks good to me. Thanks Sandy! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Security to Spark - Akka, Http, Connection...

2014-02-27 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/33#issuecomment-36275335
  
its ready for review.  I believe I've addressed all the comments from the 
previous PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1051. On YARN, executors don't doAs subm...

2014-02-27 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/29#issuecomment-36286358
  
I believe secrets are mostly for users adding secrets for other services. 
secrets are also used by the MR2 framework.  secret keys are supported in MR1 
via Credentials, I believe they just added the convenience methods to 
UserGroupInformation in mr2 for dealing with them.  It looks like MR1 handled 
credentials through the JobConf.

So I guess if you don't mind, overriding it for Yarn at least to pass them 
along might be best.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1032. If Yarn app fails before registeri...

2014-02-28 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/28#issuecomment-36362791
  
I committed this. Thanks Sandy!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Update dev merge script to use spark.git inste...

2014-02-28 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/47

Update dev merge script to use spark.git instead of incubator-spark



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark fix_merge_script

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/47.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #47


commit 8209ab152dc14f534585d1235b32f6d22d8952a5
Author: Thomas Graves 
Date:   2014-02-28T15:42:54Z

Update dev merge script to use spark.git instead of incubator-spark




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1051. On YARN, executors don't doAs subm...

2014-02-28 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/29#issuecomment-36367636
  
+1. Looks Good. Thanks Sandy!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1051. On YARN, executors don't doAs subm...

2014-02-28 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/29#issuecomment-36368483
  
@sryza it looks like this is no longer mergeable due to other check ins. 
Can you please update it to the latest.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1121 Only add avro if the build is for H...

2014-02-28 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/6#issuecomment-36370069
  
I disagree with this change going in and breaking the yarn hadoop0.23 
build. If we are going to support maven and hadoop 0.23 you should be able to 
build it without manually looking though jira/pull request to figure out what 
to add back in to get it to build.I unfortunately didn't see it before it 
went in.  

Note I think we should revert this unless someone is going to fix it soon.  
If we aren't going to support maven it should be done globally, as far as I've 
seen that discussion hasn't finished.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---