Spark has a great documentation <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.package> and guides <https://spark.apache.org/docs/latest/programming-guide.html>:
lit and col are here <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.package> getInt is here <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.Row> apply(0) is just a method on Array which is returned by collect (here <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.DataFrame> ) On 26 February 2016 at 10:47, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks Michael. Great > > d.filter(col("id") === lit(m)).show > > BTW where all these methods like lit etc are documented. Also I guess any > action call like apply(0) or getInt(0) refers to the "current" parameter? > > Regards > > On 26 February 2016 at 09:42, Michał Zieliński < > zielinski.mich...@gmail.com> wrote: > >> You need to collect the value. >> >> val m: Int = d.agg(max($"id")).collect.apply(0).getInt(0) >> d.filter(col("id") === lit(m)) >> >> On 26 February 2016 at 09:41, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Can this be done using DFs? >>> >>> >>> >>> scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) >>> >>> scala> val d = HiveContext.table("test.dummy") >>> d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered: >>> int, randomised: int, random_string: string, small_vc: string, padding: >>> string] >>> >>> scala> var m = d.agg(max($"id")) >>> m: org.apache.spark.sql.DataFrame = [max(id): int] >>> >>> How can I join these two? In other words I want to get all rows with id >>> = m here? >>> >>> d.filter($"id" = m) ? >>> >>> Thanks >>> >>> On 25/02/2016 22:58, Mohammad Tariq wrote: >>> >>> AFAIK, this isn't supported yet. A ticket >>> <https://issues.apache.org/jira/browse/SPARK-4226> is in progress >>> though. >>> >>> >>> >>> [image: http://] <http://about.me/mti> >>> >>> Tariq, Mohammad >>> about.me/mti >>> [image: http://] >>> >>> >>> >>> On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh < >>> mich.talebza...@cloudtechnologypartners.co.uk> wrote: >>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> I guess the following confirms that Spark does bot support sub-queries >>>> >>>> >>>> >>>> val d = HiveContext.table("test.dummy") >>>> >>>> d.registerTempTable("tmp") >>>> >>>> HiveContext.sql("select * from tmp where id IN (select max(id) from >>>> tmp)") >>>> >>>> It crashes >>>> >>>> The SQL works OK in Hive itself on the underlying table! >>>> >>>> select * from dummy where id IN (select max(id) from dummy); >>>> >>>> >>>> >>>> Thanks >>>> >>> >> >