You probably don't want null safe equals (<=>) with a left join. On Mon, Apr 3, 2017 at 5:46 PM gjohnson35 <gjohn...@artemishealth.com> wrote:
> The join condition with && is throwing an exception: > > val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=> > baseDF("medical_claim_id") > && mccDF("medical_claim_detail_id") <=> > baseDF("medical_claim_detail_id"), "left") > .join(revCdDF, revCdDF("revenue_code_padded_str") <=> > mccDF("mcc_code"), "left") > .select(baseDF("medical_claim_id"), > baseDF("medical_claim_detail_id"), > baseDF("revenue_code"), baseDF("rev_code_distinct_count"), > baseDF("rtos_1_1_count"), baseDF("rtos_1_0_count"), > baseDF("er_visit_flag"), baseDF("observation_stay_flag"), > revCdDF("rtos_2_code"), revCdDF("rtos_2_hierarchy")) > .where(revCdDF("rtos_2_code").between(8, 27).isNotNull) > .groupBy( > baseDF("medical_claim_id"), > baseDF("medical_claim_detail_id") > ) > .agg(min(revCdDF("rtos_2_code").alias("min_rtos_2_8_thru_27")), > min(revCdDF("rtos_2_hierarchy").alias("min_rtos_2_8_thru_27_hier"))) > > > This query runs fine: > > val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=> > baseDF("medical_claim_id"), "left") > .join(mccDF, mccDF("medical_claim_detail_id") <=> > baseDF("medical_claim_detail_id"), "left") > .join(revCdDF, revCdDF("revenue_code_padded_str") <=> > mccDF("mcc_code"), "left") > .select(baseDF("medical_claim_id"), > baseDF("medical_claim_detail_id"), > baseDF("revenue_code"), baseDF("rev_code_distinct_count"), > baseDF("rtos_1_1_count"), baseDF("rtos_1_0_count"), > baseDF("er_visit_flag"), baseDF("observation_stay_flag"), > revCdDF("rtos_2_code"), revCdDF("rtos_2_hierarchy")) > .where(revCdDF("rtos_2_code").between(8, 27).isNotNull) > .groupBy( > baseDF("medical_claim_id"), > baseDF("medical_claim_detail_id") > ) > .agg(min(revCdDF("rtos_2_code").alias("min_rtos_2_8_thru_27")), > min(revCdDF("rtos_2_hierarchy").alias("min_rtos_2_8_thru_27_hier"))) > > If I remove the multiple Columns in the join and create a join statement > for > each one then the exception goes away. Is there a better way to join > multiple columns? > > > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Scala-left-join-with-multiple-columns-Join-condition-is-missing-or-trivial-Use-the-CROSS-JOIN-syntax-tp21297.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >