Re: Spark SQL support for sub-queries

Michał Zieliński Fri, 26 Feb 2016 01:42:19 -0800

You need to collect the value.

val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0)
d.filter(col("id") === lit(m))


On 26 February 2016 at 09:41, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Can this be done using DFs?
>
>
>
> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>
> scala> val d = HiveContext.table("test.dummy")
> d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered:
> int, randomised: int, random_string: string, small_vc: string, padding:
> string]
>
> scala>  var m = d.agg(max($"id"))
> m: org.apache.spark.sql.DataFrame = [max(id): int]
>
> How can I join these two? In other words I want to get all rows with id =
> m here?
>
> d.filter($"id" = m)  ?
>
> Thanks
>
> On 25/02/2016 22:58, Mohammad Tariq wrote:
>
> AFAIK, this isn't supported yet. A ticket
> <https://issues.apache.org/jira/browse/SPARK-4226> is in progress though.
>
>
>
> [image: http://] <http://about.me/mti>
>
> Tariq, Mohammad
> about.me/mti
> [image: http://]
>
>
>
> On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh <
> mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>
>>
>>
>> Hi,
>>
>>
>>
>> I guess the following confirms that Spark does bot support sub-queries
>>
>>
>>
>> val d = HiveContext.table("test.dummy")
>>
>> d.registerTempTable("tmp")
>>
>> HiveContext.sql("select * from tmp where id IN (select max(id) from tmp)")
>>
>> It crashes
>>
>> The SQL works OK in Hive itself on the underlying table!
>>
>> select * from dummy where id IN (select max(id) from dummy);
>>
>>
>>
>> Thanks
>>
>

Re: Spark SQL support for sub-queries

Reply via email to