val sparkConf = new SparkConf()
  .setMaster("local[*]")
  .setAppName("Dataframe Test")

val sc = new SparkContext(sparkConf)
val sql = new SQLContext(sc)

val dataframe = sql.read.json("orders.json")

val expanded = dataframe
  .explode[::[Long], Long]("items", "item1")(row => row)
  .explode[::[Long], Long]("items", "item2")(row => row)

val grouped = expanded
  .where(expanded("item1") !== expanded("item2"))
  .groupBy("item1", "item2")
  .count()

val recs = grouped
  .groupBy("item1")

I found another example above, but I cant seem to figure out what this does?

val expanded = dataframe
  .explode[::[Long], Long]("items", "item1")(row => row)
  .explode[::[Long], Long]("items", "item2")(row => row)



On 5 January 2016 at 20:00, Deenar Toraskar <deenar.toras...@gmail.com>
wrote:

> Hi All
>
> I have the following spark sql query and would like to use convert this to
> use the dataframes api (spark 1.6). The eee, eep and pfep are all maps of
> (int -> float)
>
>
> select e.counterparty, epe, mpfe, eepe, noOfMonthseep, teee as
> effectiveExpectedExposure, teep as expectedExposure , tpfep as pfe
> |from exposureMeasuresCpty e
>   |  lateral view explode(eee) dummy1 as noOfMonthseee, teee
>   |  lateral view explode(eep) dummy2 as noOfMonthseep, teep
>   |  lateral view explode(pfep) dummy3 as noOfMonthspfep, tpfep
>   |where e.counterparty = '$cpty' and noOfMonthseee = noOfMonthseep and
> noOfMonthseee = noOfMonthspfep
>   |order by noOfMonthseep""".stripMargin
>
> Any guidance or samples would be appreciated. I have seen code snippets
> that handle arrays, but havent come across how to handle maps
>
> case class Book(title: String, words: String)
>    val df: RDD[Book]
>
>    case class Word(word: String)
>    val allWords = df.explode('words) {
>      case Row(words: String) => words.split(" ").map(Word(_))
>    }
>
>    val bookCountPerWord = allWords.groupBy("word").agg(countDistinct("title"))
>
>
> Regards
> Deenar
>

Reply via email to