from:"Pierremalliard"

Re: Create all the combinations of a groupBy

2019-01-23 Thread Pierremalliard

looks like i found the solution in case anyone ever encounters a similar challenge... df = spark.createDataFrame( [("a", 1, 0), ("a", 2, 42), ("a", 3, 10), ("b", 4, -1), ("b", 5, -2), ("b", 6, 12)], ("key", "consumerID", "feature") ) df.show() schema = StructType([ StructField("ID_1", Double

Create all the combinations of a groupBy

2019-01-23 Thread Pierremalliard

Hi, I am trying to generate a dataframe of all combinations that have a same key using Pyspark. example: (a,1) (a,2) (a,3) (b,1) (b,2) should return: (a, 1 , 2) (a, 1 , 3) (a, 2, 3) (b, 1 ,2) i want to do something like df.groupBy('key').combinations().apply(...) any suggestions are welc