Re: [SQL] parse_url does not work for Internationalized domain names ?

2018-01-11 Thread StanZhai
This problem was introduced by which is designed to improve performance of PARSE_URL(). The same issue exists in the following SQL: ```SQL SELECT PARSE_URL('http://stanzhai.site?p=["abc";]', 'QUERY', 'p') // return null in Spark 2.1+ // return

Re: [SQL] parse_url does not work for Internationalized domain names ?

2018-01-11 Thread StanZhai
This problem was introduced by which is designed to improve performance of PARSE_URL().The same issue exists in the following SQL:```SQLSELECT PARSE_URL('http://stanzhai.site?p=["abc";]', 'QUERY', 'p')// return null in Spark 2.1+// return ["abc"]

Re:[SQL] Syntax "case when" doesn't be supported in JOIN

2017-07-13 Thread StanZhai
A workaround is diffcult. You should consider merging this PR into your Spark. "wangshuang [via Apache Spark Developers List]" wroted at 2017-07-13 18:43: I'm trying to execute hive sql on spark sql (Also on spark thriftserver), For optimiz

Re:Re: [SQL]Analysis failed when combining Window function and GROUP BY in Spark2.x

2017-03-08 Thread StanZhai
tting this error because aggregates are planned before a windows, and the aggregate cannot find b in its grouping by expressions. On Wed, Mar 8, 2017 at 5:21 AM, StanZhai <[hidden email]> wrote: We can reproduce this using the following code: val spark = SparkSession.builder().appName(&quo

[SQL]Analysis failed when combining Window function and GROUP BY in Spark2.x

2017-03-07 Thread StanZhai
We can reproduce this using the following code: val spark = SparkSession.builder().appName("test").master("local").getOrCreate() val sql1 = """ |create temporary view tb as select * from values |(1, 0), |(1, 0), |(2, 0) |as grouping(a, b) """.stripMargin val sql = """

Re: The driver hangs at DataFrame.rdd in Spark 2.1.0

2017-02-23 Thread StanZhai
's already fixed in 2.1.0. One way to debug is to turn on trace log and check how the analyzer/optimizer behaves. On 2/22/17 11:11 PM, StanZhai wrote: Could this be related to https://issues.apac

Re: The driver hangs at DataFrame.rdd in Spark 2.1.0

2017-02-22 Thread StanZhai
Could this be related to https://issues.apache.org/jira/browse/SPARK-17733 ? -- Original -- From: "Cheng Lian-3 [via Apache Spark Developers List]";; Send time: Thursday, Feb 23, 2017 9:43 AM To: "Stan Zhai"; Subject: Re: The driver hangs at DataFrame.rdd in

The driver hangs at DataFrame.rdd in Spark 2.1.0

2017-02-22 Thread StanZhai
Hi all, The driver hangs at DataFrame.rdd in Spark 2.1.0 when the DataFrame(SQL) is complex, Following thread dump of my driver: org.apache.spark.sql.catalyst.expressions.AttributeReference.equals(namedExpressions.scala:230) org.apache.spark.sql.catalyst.expressions.IsNotNull.equals(nullExpr

Re:compile about the code

2017-02-20 Thread StanZhai
Your antlr4-maven-plugin looks like incomplete, you can try to delete ~/.m2 in your home directory, then re-compile spark. -- Original -- From: " 萝卜丝炒饭 [via Apache Spark Developers List]";; Date: Feb 20, 2017 To: "Stan Zhai"; Subject: compile about the co

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-02-13 Thread StanZhai
e been trying to upgrade our Spark from the releasing of Spark 2.1.0. This version is unstable and not available for us because of the memory problems, we should pay attention to this. StanZhai wrote > From thread dump page of Executor of WebUI, I found that there are about > 1300 threads na

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-02-07 Thread StanZhai
>From thread dump page of Executor of WebUI, I found that there are about 1300 threads named "DataStreamer for file /test/data/test_temp/_temporary/0/_temporary/attempt_20170207172435_80750_m_69_1/part-00069-690407af-0900-46b1-9590-a6d6c696fe68.snappy.parquet" in TIMED_WAITING state like this:

[SQL]SQLParser fails to resolve nested CASE WHEN statement with parentheses in Spark 2.x

2017-02-06 Thread StanZhai
Hi all, SQLParser fails to resolve nested CASE WHEN statement like this: select case when (1) + case when 1>0 then 1 else 0 end = 2 then 1 else 0 end from tb Exception Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: m

Re: [SQL]A confusing NullPointerException when creating table using Spark2.1.0

2017-02-06 Thread StanZhai
This issue has been fixed by https://github.com/apache/spark/pull/16820 . -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-A-confusing-NullPointerException-when-creating-table-using-Spark2-1-0-tp20851p2

[SQL]A confusing NullPointerException when creating table using Spark2.1.0

2017-02-03 Thread StanZhai
Hi all, After upgrading our Spark from 1.6.2 to 2.1.0, I encounter a confusing NullPointerException when creating table under Spark 2.1.0, but the problem does not exists in Spark 1.6.1. Environment: Hive 1.2.1, Hadoop 2.6.4 Code // spark is an instance

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-02-02 Thread StanZhai
an 22, 2017, at 11:36 PM, StanZhai < > mail@ > > wrote: >> >> I'm using Parallel GC. >> rxin wrote >>> Are you using G1 GC? G1 sometimes uses a lot more memory than the size >>> allocated. >>> >>> >>> On Sun, Jan 22, 201

Re: Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-01-22 Thread StanZhai
I'm using Parallel GC. rxin wrote > Are you using G1 GC? G1 sometimes uses a lot more memory than the size > allocated. > > > On Sun, Jan 22, 2017 at 12:58 AM StanZhai < > mail@ > > wrote: > >> Hi all, >> >> >> >> We j

Executors exceed maximum memory defined with `--executor-memory` in Spark 2.1.0

2017-01-22 Thread StanZhai
Hi all, We just upgraded our Spark from 1.6.2 to 2.1.0. Our Spark application is started by spark-submit with config of `--executor-memory 35G` in standalone model, but the actual use of memory up to 65G after a full gc(jmap -histo:live $pid) as follow: test@c6 ~ $ ps aux | grep CoarseGrainedExe

[SparkSQL]How does spark handle a parquet file in parallel?

2015-09-19 Thread StanZhai
Hi all, I'm using Spark (1.4.1) + Hive (0.13.1), I found that a large number of network IO appeared when query a parquet table *with only one part file* use SparkSQL. The SQL is: SELECT concat(year(fkbb5855f0), "-", month(fkbb5855f0), "-", day(fkbb5855f0), " 00:00:00"),COUNT(fk919b1d80) FROM tes

Re: [SparkSQL]Could not alter table in Spark 1.5 use HiveContext

2015-09-10 Thread StanZhai
set it to maven, we > will download needed jars directly (it is an easy way to do testing work). > > On Thu, Sep 10, 2015 at 7:45 PM, StanZhai < > mail@ > > wrote: > >> Thank you for the swift reply! >> >> The version of my hive metastore server is 0.13.

Re: [SparkSQL]Could not alter table in Spark 1.5 use HiveContext

2015-09-10 Thread StanZhai
Thank you for the swift reply! The version of my hive metastore server is 0.13.1, I've build spark use sbt like this: build/sbt -Pyarn -Phadoop-2.4 -Phive -Phive-thriftserver assembly Is spark 1.5 bind the hive client version of 1.2 by default? -- View this message in context: http://apache-s

[SparkSQL]Could not alter table in Spark 1.5 use HiveContext

2015-09-09 Thread StanZhai
After upgrade spark from 1.4.1 to 1.5.0, I encountered the following exception when use alter table statement in HiveContext: The sql is: ALTER TABLE a RENAME TO b The exception is: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. Invalid

Re: Parquet SaveMode.Append Trouble.

2015-07-30 Thread StanZhai
You should import org.apache.spark.sql.SaveMode -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Parquet-SaveMode-Append-Trouble-tp13529p13531.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -

[Spark SQL]Could not read parquet table after recreating it with the same table name

2015-07-28 Thread StanZhai
Hi all, I'am using SparkSQL in Spark 1.4.1. I encounter an error when using parquet table after recreating it, we can reproduce the error as following: ```scala // hc is an instance of HiveContext hc.sql("select * from b").show() // this is ok and b is a parquet table val df = hc.sql("sel

[SparkSQL 1.4.0]The result of SUM(xxx) in SparkSQL is 0.0 but not null when the column xxx is all null

2015-07-02 Thread StanZhai
Hi all, I have a table named test like this: | a | b | | 1 | null | | 2 | null | After upgraded the cluster from spark 1.3.1 to 1.4.0, I found the Sum function in spark 1.4 and 1.3 are different. The SQL is: select sum(b) from test In Spark 1.4.0 the result is 0.0, in spark 1.3.1 the

Re: [SparkSQL 1.4]Could not use concat with UDF in where clause

2015-06-24 Thread StanZhai
Hi Michael Armbrust, I have filed an issue on JIRA for this, https://issues.apache.org/jira/browse/SPARK-8588 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SparkSQL-1-4-Could-not-use-concat-with-

[SparkSQL 1.4]Could not use concat with UDF in where clause

2015-06-23 Thread StanZhai
Hi all, After upgraded the cluster from spark 1.3.1 to 1.4.0(rc4), I encountered the following exception when use concat with UDF in where clause: ===Exception org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to dataType on unresolved ob

Re: A confusing ClassNotFoundException error

2015-06-13 Thread StanZhai
I have encountered the similar error too at spark 1.4.0. The same code can be run on spark 1.3.1. My code is(it can be run on spark-shell): === // hc is a instance of HiveContext val df = hc.sql("select * from test limit 10") val sb = new mutable.StringBuilder