Yeah, there will likely be a community preview build soon for the 1.1 release. Benchmarking that will both give you better performance and help QA the release.
Bonus points if you turn on codegen for Spark SQL (experimental feature) when benchmarking and report bugs: "SET spark.sql.codegen=true" On Mon, Aug 4, 2014 at 5:37 PM, Cheng, Hao <[email protected]> wrote: > From the log, I noticed the "substr" was added on July 15th, 1.0.1 release > should be earlier than that. Community is now working on releasing the > 1.1.0, and also some of the performance improvements were added. Probably > you can try that for your benchmark. > > Cheng Hao > > -----Original Message----- > From: Tom [mailto:[email protected]] > Sent: Tuesday, August 05, 2014 5:53 AM > To: [email protected] > Subject: Substring in Spark SQL > > Hi, > > I am trying to run the Big Data Benchmark < > https://amplab.cs.berkeley.edu/benchmark/> , and I am stuck at Query 2 > for Spark SQL using Spark 1.0.1: > SELECT SUBSTR(sourceIP, 1, X), SUM(adRevenue) FROM uservisits GROUP BY > SUBSTR(sourceIP, 1, X) When I look into the sourcecode, it seems that > "substr" is supported by HiveQL, but not by Spark SQL, correct? > > Thanks! > > Tom > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Substring-in-Spark-SQL-tp11373.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] For additional > commands, e-mail: [email protected] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
