You need to collect the value. val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0) d.filter(col("id") === lit(m))
On 26 February 2016 at 09:41, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Can this be done using DFs? > > > > scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) > > scala> val d = HiveContext.table("test.dummy") > d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered: > int, randomised: int, random_string: string, small_vc: string, padding: > string] > > scala> var m = d.agg(max($"id")) > m: org.apache.spark.sql.DataFrame = [max(id): int] > > How can I join these two? In other words I want to get all rows with id = > m here? > > d.filter($"id" = m) ? > > Thanks > > On 25/02/2016 22:58, Mohammad Tariq wrote: > > AFAIK, this isn't supported yet. A ticket > <https://issues.apache.org/jira/browse/SPARK-4226> is in progress though. > > > > [image: http://] <http://about.me/mti> > > Tariq, Mohammad > about.me/mti > [image: http://] > > > > On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh < > mich.talebza...@cloudtechnologypartners.co.uk> wrote: > >> >> >> Hi, >> >> >> >> I guess the following confirms that Spark does bot support sub-queries >> >> >> >> val d = HiveContext.table("test.dummy") >> >> d.registerTempTable("tmp") >> >> HiveContext.sql("select * from tmp where id IN (select max(id) from tmp)") >> >> It crashes >> >> The SQL works OK in Hive itself on the underlying table! >> >> select * from dummy where id IN (select max(id) from dummy); >> >> >> >> Thanks >> >