val sparkConf = new SparkConf() .setMaster("local[*]") .setAppName("Dataframe Test")
val sc = new SparkContext(sparkConf) val sql = new SQLContext(sc) val dataframe = sql.read.json("orders.json") val expanded = dataframe .explode[::[Long], Long]("items", "item1")(row => row) .explode[::[Long], Long]("items", "item2")(row => row) val grouped = expanded .where(expanded("item1") !== expanded("item2")) .groupBy("item1", "item2") .count() val recs = grouped .groupBy("item1") I found another example above, but I cant seem to figure out what this does? val expanded = dataframe .explode[::[Long], Long]("items", "item1")(row => row) .explode[::[Long], Long]("items", "item2")(row => row) On 5 January 2016 at 20:00, Deenar Toraskar <deenar.toras...@gmail.com> wrote: > Hi All > > I have the following spark sql query and would like to use convert this to > use the dataframes api (spark 1.6). The eee, eep and pfep are all maps of > (int -> float) > > > select e.counterparty, epe, mpfe, eepe, noOfMonthseep, teee as > effectiveExpectedExposure, teep as expectedExposure , tpfep as pfe > |from exposureMeasuresCpty e > | lateral view explode(eee) dummy1 as noOfMonthseee, teee > | lateral view explode(eep) dummy2 as noOfMonthseep, teep > | lateral view explode(pfep) dummy3 as noOfMonthspfep, tpfep > |where e.counterparty = '$cpty' and noOfMonthseee = noOfMonthseep and > noOfMonthseee = noOfMonthspfep > |order by noOfMonthseep""".stripMargin > > Any guidance or samples would be appreciated. I have seen code snippets > that handle arrays, but havent come across how to handle maps > > case class Book(title: String, words: String) > val df: RDD[Book] > > case class Word(word: String) > val allWords = df.explode('words) { > case Row(words: String) => words.split(" ").map(Word(_)) > } > > val bookCountPerWord = allWords.groupBy("word").agg(countDistinct("title")) > > > Regards > Deenar >