Re: Spark SQL support for sub-queries

Mich Talebzadeh Fri, 26 Feb 2016 16:03:57 -0800

Ok let us try this

val d = HiveContext.table("test.dummy")
d.registerTempTable("tmp")
//Obtain boundary values
var minValue : Int = HiveContext.sql("SELECT minRow.id AS minValue FROM
(SELECT min(struct(id)) as minRow FROM tmp) AS
a").collect.apply(0).getInt(0)
var maxValue : Int = HiveContext.sql("SELECT maxRow.id AS maxValue FROM
(SELECT max(struct(id)) as maxRow FROM tmp) AS
b").collect.apply(0).getInt(0)
d.filter( col("id") === lit(minValue) || col("id") ===
lit(maxValue)).orderBy(col("id")).show



This works OK as well



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 26 February 2016 at 23:21, Yin Yang <yy201...@gmail.com> wrote:

> I tried the following:
>
> scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a",
> "b").registerTempTable("test")
>
> scala> val df = sql("SELECT maxRow.* FROM (SELECT max(struct(id, b, a)) as
> maxRow FROM test) a")
> df: org.apache.spark.sql.DataFrame = [id: int, b: string ... 1 more field]
>
> scala> df.show
> +---+----+---+
> | id|   b|  a|
> +---+----+---+
> |  2|test|  a|
> +---+----+---+
>
> Looks like the sort order is governed by the order give in struct().
>
> Nice feature.
>
> On Fri, Feb 26, 2016 at 12:30 PM, Michael Armbrust <mich...@databricks.com
> > wrote:
>
>> There will probably be some subquery support in 2.0.  That particular
>> query would be more efficient to express as an argmax however.  Here is
>> an example in Spark 1.6
>> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/3170497669323442/2840265927289860/c072bba513.html>
>> .
>>
>> On Thu, Feb 25, 2016 at 2:58 PM, Mohammad Tariq <donta...@gmail.com>
>> wrote:
>>
>>> AFAIK, this isn't supported yet. A ticket
>>> <https://issues.apache.org/jira/browse/SPARK-4226> is in progress
>>> though.
>>>
>>>
>>>
>>> [image: http://]
>>>
>>> Tariq, Mohammad
>>> about.me/mti
>>> [image: http://]
>>> <http://about.me/mti>
>>>
>>>
>>> On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh <
>>> mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>>
>>>>
>>>> I guess the following confirms that Spark does bot support sub-queries
>>>>
>>>>
>>>>
>>>> val d = HiveContext.table("test.dummy")
>>>>
>>>> d.registerTempTable("tmp")
>>>>
>>>> HiveContext.sql("select * from tmp where id IN (select max(id) from
>>>> tmp)")
>>>>
>>>> It crashes
>>>>
>>>> The SQL works OK in Hive itself on the underlying table!
>>>>
>>>> select * from dummy where id IN (select max(id) from dummy);
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> --
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>> LinkedIn  
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> NOTE: The information in this email is proprietary and confidential. This 
>>>> message is for the designated recipient only, if you are not the intended 
>>>> recipient, you should destroy it immediately. Any information in this 
>>>> message shall not be understood as given or endorsed by Cloud Technology 
>>>> Partners Ltd, its subsidiaries or their employees, unless expressly so 
>>>> stated. It is the responsibility of the recipient to ensure that this 
>>>> email is virus free, therefore neither Cloud Technology partners Ltd, its 
>>>> subsidiaries nor their employees accept any responsibility.
>>>>
>>>>
>>>>
>>>
>>
>

Re: Spark SQL support for sub-queries

Reply via email to