looks like i found the solution in case anyone ever encounters a similar
challenge...
df = spark.createDataFrame(
[("a", 1, 0), ("a", 2, 42), ("a", 3, 10), ("b", 4, -1), ("b", 5, -2), ("b",
6, 12)],
("key", "consumerID", "feature")
)
df.show()
schema = StructType([
StructField("ID_1", Double
Hi,
I am trying to generate a dataframe of all combinations that have a same key
using Pyspark.
example:
(a,1)
(a,2)
(a,3)
(b,1)
(b,2)
should return:
(a, 1 , 2)
(a, 1 , 3)
(a, 2, 3)
(b, 1 ,2)
i want to do something like df.groupBy('key').combinations().apply(...)
any suggestions are welc