Re: Create all the combinations of a groupBy

2019-01-23 Thread Pierremalliard
looks like i found the solution in case anyone ever encounters a similar challenge... df = spark.createDataFrame( [("a", 1, 0), ("a", 2, 42), ("a", 3, 10), ("b", 4, -1), ("b", 5, -2), ("b", 6, 12)], ("key", "consumerID", "feature") ) df.show() schema = StructType([ StructField("ID_1", Double

Re: Create all the combinations of a groupBy

2019-01-23 Thread hemant singh
Check roll up and cube functions in spark sql. On Wed, 23 Jan 2019 at 10:47 PM, Pierremalliard < pierre.de-malli...@capgemini.com> wrote: > Hi, > > I am trying to generate a dataframe of all combinations that have a same > key > using Pyspark. > > example: > > (a,1) > (a,2) > (a,3) > (b,1) > (b,2

Create all the combinations of a groupBy

2019-01-23 Thread Pierremalliard
Hi, I am trying to generate a dataframe of all combinations that have a same key using Pyspark. example: (a,1) (a,2) (a,3) (b,1) (b,2) should return: (a, 1 , 2) (a, 1 , 3) (a, 2, 3) (b, 1 ,2) i want to do something like df.groupBy('key').combinations().apply(...) any suggestions are welc