looks like i found the solution in case anyone ever encounters a similar
challenge...
df = spark.createDataFrame(
[("a", 1, 0), ("a", 2, 42), ("a", 3, 10), ("b", 4, -1), ("b", 5, -2), ("b",
6, 12)],
("key", "consumerID", "feature")
)
df.show()
schema = StructType([
StructField("ID_1", Double
Check roll up and cube functions in spark sql.
On Wed, 23 Jan 2019 at 10:47 PM, Pierremalliard <
pierre.de-malli...@capgemini.com> wrote:
> Hi,
>
> I am trying to generate a dataframe of all combinations that have a same
> key
> using Pyspark.
>
> example:
>
> (a,1)
> (a,2)
> (a,3)
> (b,1)
> (b,2
Hi,
I am trying to generate a dataframe of all combinations that have a same key
using Pyspark.
example:
(a,1)
(a,2)
(a,3)
(b,1)
(b,2)
should return:
(a, 1 , 2)
(a, 1 , 3)
(a, 2, 3)
(b, 1 ,2)
i want to do something like df.groupBy('key').combinations().apply(...)
any suggestions are welc