If you know the map keys ahead of time then you can just extract them directly.
Here are a few examples <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/1826874816150656/2840265927289860/latest.html> . On Thu, Mar 24, 2016 at 12:01 PM, Michał Zieliński < zielinski.mich...@gmail.com> wrote: > Hi, > > Imagine you have a structure like this: > > val events = sqlContext.createDataFrame( > Seq( > ("a", Map("a"->1,"b"->1)), > ("b", Map("b"->1,"c"->1)), > ("c", Map("a"->1,"c"->1)) > ) > ).toDF("id","map") > > What I want to achieve is have the map values as a separate columns. > Basically I want to achieve this: > > +---+----+----+----+ > | id| a| b| c| > +---+----+----+----+ > | a| 1| 1|null| > | b|null| 1| 1| > | c| 1|null| 1| > +---+----+----+----+ > > I managed to create it with an explode-pivot combo, but for large dataset, > and a list of map keys around 1000 I imagine this will > be prohibitively expensive. I reckon there must be a much easier way to > achieve that, than: > > val exploded = > events.select(col("id"),explode(col("map"))).groupBy("id").pivot("key").sum("value") > > Any help would be appreciated. :) >