Re: Making BatchPythonEvaluation actually Batch

2016-03-31 Thread Davies Liu
@Justin, it's fixed by https://github.com/apache/spark/pull/12057 On Thu, Feb 11, 2016 at 11:26 AM, Davies Liu wrote: > Had a quick look in your commit, I think that make sense, could you > send a PR for that, then we can review it. > > In order to support 2), we need to change the serialized Pyt

What influences the space complexity of Spark operations?

2016-03-31 Thread Steve Johnston
*What we’ve observed* Increasing the number of partitions (and thus decreasing the partition size) seems to reliably help avoid OOM errors. To demonstrate this we used a single executor and loaded a small table into a DataFrame, persisted it with MEMORY_AND_DISK, repartitioned it and joined it to i

Re: Spark SQL UDF Returning Rows

2016-03-31 Thread Hamel Kothari
Hi Michael, Thanks for the response. I am just extracting part of the nested structure and returning only a piece that same structure. I haven't looked at Encoders or Datasets since we're bound to 1.6 for now but I'll look at encoders to see if that covers it. Datasets seems like it would solve t

Jenkins PR failing, Mima unhappy: bad constant pool tag 50 at byte 12

2016-03-31 Thread Steve Loughran
A WiP PR of mine is failing in mima: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54525/consoleFull [info] spark-examples: previous-artifact not set, not analyzing binary compatibility java.lang.RuntimeException: bad constant pool tag 50 at byte 12 at com.typesafe.tools.m

Re: Question Create External table location S3

2016-03-31 Thread Raymond Honderdors
Thanks for the insites Ill try to add it Sent from Outlook Mobile On Thu, Mar 31, 2016 at 4:39 AM -0700, "Steve Loughran" mailto:ste...@hortonworks.com>> wrote: On 31 Mar 2016, at 10:00, Raymond Honderdors mailto:raymond.honderd...@sizmek.com>> wrote: Hi, I pulled t

Re: Question Create External table location S3

2016-03-31 Thread Steve Loughran
On 31 Mar 2016, at 10:00, Raymond Honderdors mailto:raymond.honderd...@sizmek.com>> wrote: Hi, I pulled the latest version git pull git://github.com/apache/spark.git Compiled: mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests clean package now I am getti

Re: Any documentation on Spark's security model beyond YARN?

2016-03-31 Thread Steve Loughran
> On 30 Mar 2016, at 21:02, Sean Busbey wrote: > > On Wed, Mar 30, 2016 at 4:33 AM, Steve Loughran > wrote: >> >>> On 29 Mar 2016, at 22:19, Michael Segel wrote: >>> >>> Hi, >>> >>> So yeah, I know that Spark jobs running on a Hadoop cluster will inherit >>> its security from the underlyi

Question Create External table location S3

2016-03-31 Thread Raymond Honderdors
Hi, I pulled the latest version git pull git://github.com/apache/spark.git Compiled: mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests clean package now I am getting the following error: Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED: