subject:"Spark SQL support for sub\-queries"

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh

Ok let us try this val d = HiveContext.table("test.dummy") d.registerTempTable("tmp") //Obtain boundary values var minValue : Int = HiveContext.sql("SELECT minRow.id AS minValue FROM (SELECT min(struct(id)) as minRow FROM tmp) AS a").collect.apply(0).getInt(0) var maxValue : Int = HiveContext.sql(

Re: Spark SQL support for sub-queries

2016-02-26 Thread Yin Yang

I tried the following: scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a", "b").registerTempTable("test") scala> val df = sql("SELECT maxRow.* FROM (SELECT max(struct(id, b, a)) as maxRow FROM test) a") df: org.apache.spark.sql.DataFrame = [id: int, b: string ... 1 more field] scala> d

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh

I am using Hive 2. Sounds like Hive 2 still does not support more than one level of sub-query! hive> set hive.execution.engine=mr; Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X rel

Re: Spark SQL support for sub-queries

2016-02-26 Thread Michael Armbrust

There will probably be some subquery support in 2.0. That particular query would be more efficient to express as an argmax however. Here is an example in Spark 1.6

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh

Good stuff I decided to do some boundary value analysis by getting records where the ID (unique value) is IN (min() and max() Unfortanely Hive SQL does not yet support more than one level of sub-query. For example this operation is perfectly valid in Oracle select * from dummy where id IN (selec

Re: Spark SQL support for sub-queries

2016-02-26 Thread Yin Yang

Since collect is involved, the approach would be slower compared to the SQL Mich gave in his first email. On Fri, Feb 26, 2016 at 1:42 AM, Michał Zieliński < zielinski.mich...@gmail.com> wrote: > You need to collect the value. > > val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0) > d.filt

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh

thanks much appreciated On 26 February 2016 at 09:54, Michał Zieliński wrote: > Spark has a great documentation > > and > guides : > > lit and col are here

Re: Spark SQL support for sub-queries

2016-02-26 Thread Michał Zieliński

Spark has a great documentation and guides : lit and col are here getInt is her

Re: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh

Thanks Michael. Great d.filter(col("id") === lit(m)).show BTW where all these methods like lit etc are documented. Also I guess any action call like apply(0) or getInt(0) refers to the "current" parameter? Regards On 26 February 2016 at 09:42, Michał Zieliński wrote: > You need to collect th

Re: Spark SQL support for sub-queries

2016-02-26 Thread Michał Zieliński

You need to collect the value. val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0) d.filter(col("id") === lit(m)) On 26 February 2016 at 09:41, Mich Talebzadeh wrote: > Can this be done using DFs? > > > > scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) > > scala> va

Fwd: Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh

Can this be done using DFs? scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> val d = HiveContext.table("test.dummy") d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered: int, randomised: int, random_string: string, small_vc: string, padding: stri

Spark SQL support for sub-queries

2016-02-26 Thread Mich Talebzadeh

Can this be done using DFs? scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> val d = HiveContext.table("test.dummy") d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered: int, randomised: int, random_string: string, small_vc: string, padding: stri

Re: Spark SQL support for sub-queries

2016-02-25 Thread Mohammad Tariq

AFAIK, this isn't supported yet. A ticket is in progress though. [image: http://] Tariq, Mohammad about.me/mti [image: http://] On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh < mich.talebza...@cloudtechnologypartners.c

Spark SQL support for sub-queries

2016-02-25 Thread Mich Talebzadeh

Hi, I guess the following confirms that Spark does bot support sub-queries val d = HiveContext.table("test.dummy") d.registerTempTable("tmp") HiveContext.sql("select * from tmp where id IN (select max(id) from tmp)") It crashes The SQL works OK in Hive itself on the underlying table!

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Fwd: Spark SQL support for sub-queries

Spark SQL support for sub-queries

Re: Spark SQL support for sub-queries

Spark SQL support for sub-queries

14 matches

Site Navigation

Mail list logo

Footer information