TakeOrderedAndProject operator may causes an OOM

2016-02-03 Thread 汪洋
Hi, Currently the TakeOrderedAndProject operator in spark sql uses RDD’s takeOrdered method. When we pass a large limit to operator, however, it will return partitionNum*limit number of records to the driver which may cause an OOM. Are there any plans to deal with the problem in the community?

Spark 1.6: Why Including hive-jdbc in assembly when -Phive-provided is set?

2016-02-03 Thread Andrew Lee
Hi All, I have a question regarding the hive-jdbc library that is being included in the assembly JAR. Build command. mvn -U -X -Phadoop-2.6 -Phadoop-provided -Phive-provided -Pyarn -Phive-thriftserver -Psparkr -DskipTests install In the pom.xml file, the scope for hive JARs are set to 'com

Path to resource added with SQL: ADD FILE

2016-02-03 Thread Antonio Piccolboni
Sorry if this is more appropriate for user list, I asked there on 12/17 and got the silence treatment. I am writing a UDF that needs some additional info to perform its task. This information is in a file that I reference in a SQL ADD FILE statement. I expect that file to be accessible in the worki

SparkOscope: Enabling Spark Optimization through Cross-stack Monitoring and Visualization

2016-02-03 Thread Yiannis Gkoufas
Hi all, I just wanted to introduce some of my recent work in IBM Research around Spark and especially its Metric System and Web UI. As a quick overview of our contributions: We have a created a new type of Sink for the metrics ( HDFSSink ) which captures the metrics into HDFS, We have extended the

Re: Spark 1.6.1

2016-02-03 Thread Daniel Darabos
On Tue, Feb 2, 2016 at 7:10 PM, Michael Armbrust wrote: > What about the memory leak bug? >> https://issues.apache.org/jira/browse/SPARK-11293 >> Even after the memory rewrite in 1.6.0, it still happens in some cases. >> Will it be fixed for 1.6.1? >> > > I think we have enough issues queued up t

RE: spark hivethriftserver problem on 1.5.0 -> 1.6.0 upgrade

2016-02-03 Thread james.gre...@baesystems.com
I have a workaround for this issue which is to go back to single session mode for the thrift server: conf.set("spark.sql.hive.thriftServer.singleSession", "true") This seems to mean that temp tables can be registered in 1.6.0 with a remote metastore. Cheers James From: Yin Huai [mailto:yh.

Re: Spark 1.6.1

2016-02-03 Thread Steve Loughran
On 2 Feb 2016, at 18:48, Michael Armbrust mailto:mich...@databricks.com>> wrote: I'm waiting for a few last fixes to be merged. Hoping to cut an RC in the next few days. I've just added https://issues.apache.org/jira/browse/SPARK-12807 to the list; there's a PR urgently in need of review.