Ok let us try this val d = HiveContext.table("test.dummy") d.registerTempTable("tmp") //Obtain boundary values var minValue : Int = HiveContext.sql("SELECT minRow.id AS minValue FROM (SELECT min(struct(id)) as minRow FROM tmp) AS a").collect.apply(0).getInt(0) var maxValue : Int = HiveContext.sql("SELECT maxRow.id AS maxValue FROM (SELECT max(struct(id)) as maxRow FROM tmp) AS b").collect.apply(0).getInt(0) d.filter( col("id") === lit(minValue) || col("id") === lit(maxValue)).orderBy(col("id")).show
This works OK as well Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 26 February 2016 at 23:21, Yin Yang <yy201...@gmail.com> wrote: > I tried the following: > > scala> Seq((2, "a", "test"), (2, "b", "foo")).toDF("id", "a", > "b").registerTempTable("test") > > scala> val df = sql("SELECT maxRow.* FROM (SELECT max(struct(id, b, a)) as > maxRow FROM test) a") > df: org.apache.spark.sql.DataFrame = [id: int, b: string ... 1 more field] > > scala> df.show > +---+----+---+ > | id| b| a| > +---+----+---+ > | 2|test| a| > +---+----+---+ > > Looks like the sort order is governed by the order give in struct(). > > Nice feature. > > On Fri, Feb 26, 2016 at 12:30 PM, Michael Armbrust <mich...@databricks.com > > wrote: > >> There will probably be some subquery support in 2.0. That particular >> query would be more efficient to express as an argmax however. Here is >> an example in Spark 1.6 >> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/3170497669323442/2840265927289860/c072bba513.html> >> . >> >> On Thu, Feb 25, 2016 at 2:58 PM, Mohammad Tariq <donta...@gmail.com> >> wrote: >> >>> AFAIK, this isn't supported yet. A ticket >>> <https://issues.apache.org/jira/browse/SPARK-4226> is in progress >>> though. >>> >>> >>> >>> [image: http://] >>> >>> Tariq, Mohammad >>> about.me/mti >>> [image: http://] >>> <http://about.me/mti> >>> >>> >>> On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh < >>> mich.talebza...@cloudtechnologypartners.co.uk> wrote: >>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> I guess the following confirms that Spark does bot support sub-queries >>>> >>>> >>>> >>>> val d = HiveContext.table("test.dummy") >>>> >>>> d.registerTempTable("tmp") >>>> >>>> HiveContext.sql("select * from tmp where id IN (select max(id) from >>>> tmp)") >>>> >>>> It crashes >>>> >>>> The SQL works OK in Hive itself on the underlying table! >>>> >>>> select * from dummy where id IN (select max(id) from dummy); >>>> >>>> >>>> >>>> Thanks >>>> >>>> >>>> -- >>>> >>>> Dr Mich Talebzadeh >>>> >>>> LinkedIn >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> http://talebzadehmich.wordpress.com >>>> >>>> NOTE: The information in this email is proprietary and confidential. This >>>> message is for the designated recipient only, if you are not the intended >>>> recipient, you should destroy it immediately. Any information in this >>>> message shall not be understood as given or endorsed by Cloud Technology >>>> Partners Ltd, its subsidiaries or their employees, unless expressly so >>>> stated. It is the responsibility of the recipient to ensure that this >>>> email is virus free, therefore neither Cloud Technology partners Ltd, its >>>> subsidiaries nor their employees accept any responsibility. >>>> >>>> >>>> >>> >> >