> > You can include * and a column alias in the same select clause > var df1 = sqlContext.sql("select *, column_id AS table1_id from table1")
FYI, this does not ultimately work as the * still includes column_id and you cannot have two columns of that name in the joined DataFrame. So I ended up aliasing both sides of the join. On Sun, Mar 22, 2015 at 1:25 PM, Michael Armbrust <mich...@databricks.com> wrote: > You can include * and a column alias in the same select clause > var df1 = sqlContext.sql("select *, column_id AS table1_id from table1") > > > I'm also hoping to resolve SPARK-6376 > <https://issues.apache.org/jira/browse/SPARK-6376> before Spark 1.3.1 > which will let you do something like: > var df1 = sqlContext.sql("select * from table1").as("t1") > var df2 = sqlContext.sql("select * from table2).as("t2") > df1.join(df2, df1("column_id") === df2("column_id")).select("t1.column_id") > > Finally, there is SPARK-6380 > <https://issues.apache.org/jira/browse/SPARK-6380> that hopes to simplify > this particular case. > > Michael > > On Sat, Mar 21, 2015 at 3:02 PM, Eric Friedman <eric.d.fried...@gmail.com> > wrote: > >> I have a couple of data frames that I pulled from SparkSQL and the >> primary key of one is a foreign key of the same name in the other. I'd >> rather not have to specify each column in the SELECT statement just so that >> I can rename this single column. >> >> When I try to join the data frames, I get an exception because it finds >> the two columns of the same name to be ambiguous. Is there a way to >> specify which side of the join comes from data frame A and which comes from >> B? >> >> var df1 = sqlContext.sql("select * from table1") >> var df2 = sqlContext.sql("select * from table2) >> >> df1.join(df2, df1("column_id") === df2("column_id")) >> > >