Currently seems DataFrame doesn't enforce the uniqueness of field name. So it is possible to have same fields in DataFrame. It usually happens after join especially self-join. Although user can rename the column names before join, or rename the column names after join (DataFrame#withColunmRenamed is not sufficient for now). In hive, the ambiguous name can be resolved by using the table name as prefix, but seems DataFrame don't support it ( I mean DataFrame API rather than SparkSQL). I think we have 2 options here 1. Enforce the uniqueness of field name in DataFrame, so that the following operations would not cause ambiguous column reference 2. Provide DataFrame#withColunmsRenamed(oldColumns:Seq[String], newColumns:Seq[String]) to allow change schema names
For now, I would prefer option 2 which is more easier to implement and keep compatibility. val df = ... // schema (name, age) val df2 = df.join(df, "name") // schema (name, age, age) df2.select("age") // ambiguous column reference. -- Best Regards Jeff Zhang