Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh
Ok let us try this val d = HiveContext.table("test.dummy") d.registerTempTable("tmp") //Obtain boundary values var minValue : Int = HiveContext.sql("SELECT minRow.id AS minValue FROM (SELECT min(struct(id)) as minRow FROM tmp) AS a").collect.apply(0).getInt(0) var maxValue : Int = HiveContext.sql(

Re: Spark SQL support for sub-queries

2016-02-26 Thread Yin Yang
I tried the following: scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT maxRow.* FROM (SELECT max(struct(id, b, a)) as maxRow FROM test) a") df: org.apache.spark.sql.DataFrame = [id: int, b: string ... 1 more field] scala> d

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh
I am using Hive 2. Sounds like Hive 2 still does not support more than one level of sub-query! hive> set hive.execution.engine=mr; Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X rel

Re: Spark SQL support for sub-queries

2016-02-26 Thread Michael Armbrust
There will probably be some subquery support in 2.0. That particular query would be more efficient to express as an argmax however. Here is an example in Spark 1.6

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh
Good stuff I decided to do some boundary value analysis by getting records where the ID (unique value) is IN (min() and max() Unfortanely Hive SQL does not yet support more than one level of sub-query. For example this operation is perfectly valid in Oracle select * from dummy where id IN (selec

Re: Spark SQL support for sub-queries

2016-02-26 Thread Yin Yang
Since collect is involved, the approach would be slower compared to the SQL Mich gave in his first email. On Fri, Feb 26, 2016 at 1:42 AM, Michał Zieliński < zielinski.mich...@gmail.com> wrote: > You need to collect the value. > > val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0) > d.filt

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh
thanks much appreciated On 26 February 2016 at 09:54, Michał Zieliński wrote: > Spark has a great documentation > > and > guides : > > lit and col are here

Re: Spark SQL support for sub-queries

2016-02-26 Thread Michał Zieliński
Spark has a great documentation and guides : lit and col are here getInt is her

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh
Thanks Michael. Great d.filter(col("id") === lit(m)).show BTW where all these methods like lit etc are documented. Also I guess any action call like apply(0) or getInt(0) refers to the "current" parameter? Regards On 26 February 2016 at 09:42, Michał Zieliński wrote: > You need to collect th

Re: Spark SQL support for sub-queries

2016-02-26 Thread Michał Zieliński
You need to collect the value. val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0) d.filter(col("id") === lit(m)) On 26 February 2016 at 09:41, Mich Talebzadeh wrote: > Can this be done using DFs? > > > > scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) > > scala> va

Fwd: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh
Can this be done using DFs? scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> val d = HiveContext.table("test.dummy") d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered: int, randomised: int, random_string: string, small_vc: string, padding: stri

Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh
Can this be done using DFs? scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> val d = HiveContext.table("test.dummy") d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered: int, randomised: int, random_string: string, small_vc: string, padding: stri

Re: Spark SQL support for sub-queries

2016-02-25 Thread Mohammad Tariq
AFAIK, this isn't supported yet. A ticket is in progress though. [image: http://] Tariq, Mohammad about.me/mti [image: http://] On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh < mich.talebza...@cloudtechnologypartners.c

Spark SQL support for sub-queries

2016-02-25 Thread Mich Talebzadeh
Hi, I guess the following confirms that Spark does bot support sub-queries val d = HiveContext.table("test.dummy") d.registerTempTable("tmp") HiveContext.sql("select * from tmp where id IN (select max(id) from tmp)") It crashes The SQL works OK in Hive itself on the underlying table!