Hi guys, I have a 2 input data streams that I want to join using Dataframes and unfortunately I get the message produced by https://issues.apache.org/jira/browse/SPARK-5063 as I can't reference rdd1 in (2) :
(1) val rdd1 = sc.esRDD(es_resource.toLowerCase, query) .map(r => (r._1, r._2)) (2) mgs.map(x => x._1) .foreachRDD { rdd => val sqlContext = SQLContext.getOrCreate(rdd.sparkContext) import sqlContext.implicits._ val df_aids = rdd.toDF("id") val df = rdd1.toDF("id", "aid") df.select(explode(df("aid")).as("aid"), df("id")) .join(df_aids, $"aid" === df_aids("id")) .select(df("id"), df_aids("id")) ..... } Is there a way to still use Dataframes to do it or I need to do everything using RDDs join only ? And If I need to use only RDDs join, how to do it ? as I have a RDD (rdd1) and a DStream (mgs) ? Thanks -- Cyril SCETBON --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org