Re: Bug? Can't reference to the column by name after join two DataFrame on a same name key

Yin Huai Thu, 23 Apr 2015 11:35:44 -0700

Hi Shuai,

You can use "as" to create a table alias. For example, df1.as("df1"). Then
you can use $"df1.col" to refer it.


Thanks,

Yin

On Thu, Apr 23, 2015 at 11:14 AM, Shuai Zheng <szheng.c...@gmail.com> wrote:

> Hi All,
>
>
>
> I use 1.3.1
>
>
>
> When I have two DF and join them on a same name key, after that, I can’t
> get the common key by name.
>
>
>
> Basically:
>
> select * from t1 inner join t2 on t1.col1 = t2.col1
>
>
>
> And I am using purely DataFrame, spark SqlContext not HiveContext
>
>
>
> DataFrame df3 = df1.join(df2, df1.col(col).equalTo(df2.col(col))).select(
> *col*);
>
>
>
> because df1 and df2 join on the same key col,
>
>
>
> Then I can't reference the key col. I understand I should use a full
> qualified name for that column (like in SQL, use t1.col), but I don’t know
> how should I address this in spark sql.
>
>
>
> Exception in thread "main" org.apache.spark.sql.AnalysisException:
> Reference 'id' is ambiguous, could be: id#8L, id#0L.;
>
>
>
> It looks that joined key can't be referenced by name or by df1.col name
> pattern.
>
> The https://issues.apache.org/jira/browse/SPARK-5278 refer to a hive
> case, so I am not sure whether it is the same issue, but I still have the
> issue in latest code.
>
>
>
> It looks like the result after join won't keep the parent DF information
> anywhere?
>
>
>
> I check the ticket: https://issues.apache.org/jira/browse/SPARK-6273
>
>
>
> But not sure whether  it is the same issue? Should I open a new ticket for
> this?
>
>
>
> Regards,
>
>
>
> Shuai
>
>
>

Re: Bug? Can't reference to the column by name after join two DataFrame on a same name key

Reply via email to