You can use a combination of explode and distinct before joining.
from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
# Create a SparkSession
spark = SparkSession.builder \
.appName("JoinExample") \
.getOrCreate()
sc = spark.sparkContext
# Set the log level to
Hi All,
Could anyone have any idea or suggestion of any alternate way to achieve
this scenario?
Thanks.
On Sat, May 11, 2024 at 6:55 AM Damien Hawes wrote:
> Right now, with the structure of your data, it isn't possible.
>
> The rows aren't duplicates of each other. "a" and "b" both exist in t