This is my code as below:
cfg = SparkConf().setAppName('MyApp') spark = SparkSession.builder.config(conf=cfg).getOrCreate() rdd1 = spark.createDataFrame([(1, 'a'), (2, 'b'), (4, 'c')], ['idx', 'val']) rdd1.registerTempTable('rdd1') rdd2 = spark.createDataFrame([(1, 2, 100), (1, 3, 200), (2, 3, 300)], ['key1', 'key2', 'val']) rdd2.registerTempTable('rdd2') what_i_want = spark.sql(""" select * from rdd2 a left outer join rdd1 b on a.key1 = b.idx left outer join rdd1 c on a.key2 = c.idx """) what_i_want.show() try_to_use_API = rdd2.join(rdd1, on=[rdd2['key1'] == rdd1['idx']], how='left_outer') \ .join(rdd1, on=[rdd2['key2'] == rdd1['idx']], how='left_outer') try_to_use_API.show() But try_to_use_API does not work as well and rais error: pyspark.sql.utils.AnalysisException: u'Both sides of this join are outside the broadcasting threshold and computing it could be prohibitively expensive. To explicitly enable it, please set spark.sql.crossJoin.enabled = true;' How can I fix this Thanks