Re: Cant join same dataframe twice ?

Ted Yu Wed, 27 Apr 2016 08:56:25 -0700

I wonder if Spark can provide better support for this case.

The following schema is not user friendly (shown previsouly):


StructField(b,IntegerType,false), StructField(b,IntegerType,false)

Except for 'select *', there is no way for user to query any of the two
fields.

On Tue, Apr 26, 2016 at 10:17 PM, Takeshi Yamamuro <linguin....@gmail.com>
wrote:

> Based on my example, how about renaming columns?
>
> val df1 = Seq((1, 1), (2, 2), (3, 3)).toDF("a", "b")
> val df2 = Seq((1, 1), (2, 2), (3, 3)).toDF("a", "b")
> val df3 = df1.join(df2, "a").select($"a", df1("b").as("1-b"),
> df2("b").as("2-b"))
> val df4 = df3.join(df2, df3("2-b") === df2("b"))
>
> // maropu
>
> On Wed, Apr 27, 2016 at 1:58 PM, Divya Gehlot <divya.htco...@gmail.com>
> wrote:
>
>> Correct Takeshi
>> Even I am facing the same issue .
>>
>> How to avoid the ambiguity ?
>>
>>
>> On 27 April 2016 at 11:54, Takeshi Yamamuro <linguin....@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I tried;
>>> val df1 = Seq((1, 1), (2, 2), (3, 3)).toDF("a", "b")
>>> val df2 = Seq((1, 1), (2, 2), (3, 3)).toDF("a", "b")
>>> val df3 = df1.join(df2, "a")
>>> val df4 = df3.join(df2, "b")
>>>
>>> And I got; org.apache.spark.sql.AnalysisException: Reference 'b' is
>>> ambiguous, could be: b#6, b#14.;
>>> If same case, this message makes sense and this is clear.
>>>
>>> Thought?
>>>
>>> // maropu
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Apr 27, 2016 at 6:09 AM, Prasad Ravilla <pras...@slalom.com>
>>> wrote:
>>>
>>>> Also, check the column names of df1 ( after joining df2 and df3 ).
>>>>
>>>> Prasad.
>>>>
>>>> From: Ted Yu
>>>> Date: Monday, April 25, 2016 at 8:35 PM
>>>> To: Divya Gehlot
>>>> Cc: "user @spark"
>>>> Subject: Re: Cant join same dataframe twice ?
>>>>
>>>> Can you show us the structure of df2 and df3 ?
>>>>
>>>> Thanks
>>>>
>>>> On Mon, Apr 25, 2016 at 8:23 PM, Divya Gehlot <divya.htco...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> I am using Spark 1.5.2 .
>>>>> I have a use case where I need to join the same dataframe twice on two
>>>>> different columns.
>>>>> I am getting error missing Columns
>>>>>
>>>>> For instance ,
>>>>> val df1 = df2.join(df3,"Column1")
>>>>> Below throwing error missing columns
>>>>> val df 4 = df1.join(df3,"Column2")
>>>>>
>>>>> Is the bug or valid scenario ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Divya
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Cant join same dataframe twice ?

Reply via email to