Hi,

Thanks for your response. I modified my code as per your suggestion, but
now I am getting a runtime error. Here's my code:

val df_1 = df.filter( df("event") === 0)
                  . select("country", "cnt")

val df_2 = df.filter( df("event") === 3)
                  . select("country", "cnt")

df_1.show()
//produces the following output :
// country    cnt
//   tw           3000
//   uk           2000
//   us           1000

df_2.show()
//produces the following output :
// country    cnt
//   tw           25
//   uk           200
//   us           95

val both = df_2.join(df_1, df_2("country")===df_1("country"), "left_outer")

I am getting the following error when executing the join statement:

java.util.NoSuchElementException: next on empty iterator.

This error seems to be originating at DataFrame.join (line 133 in
DataFrame.scala).

The show() results show that both dataframes do have columns named
"country" and that they are non-empty. I also tried the simpler join ( i.e.
df_2.join(df_1) ) and got the same error stated above.

I would like to know what is wrong with the join statement above.

thanks
























On Tue, Mar 24, 2015 at 6:08 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> You need to use `===`, so that you are constructing a column expression
> instead of evaluating the standard scala equality method.  Calling methods
> to access columns (i.e. df.county is only supported in python).
>
> val join_df =  df1.join( df2, df1("country") === df2("country"),
> "left_outer")
>
> On Tue, Mar 24, 2015 at 5:50 PM, SK <skrishna...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to port some code that was working in Spark 1.2.0 on the
>> latest
>> version, Spark 1.3.0. This code involves a left outer join between two
>> SchemaRDDs which I am now trying to change to a left outer join between 2
>> DataFrames. I followed the example  for left outer join of DataFrame at
>>
>> https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html
>>
>> Here's my code, where df1 and df2 are the 2 dataframes I am joining on the
>> "country" field:
>>
>>  val join_df =  df1.join( df2,  df1.country == df2.country, "left_outer")
>>
>> But I got a compilation error that value  country is not a member of
>> sql.DataFrame
>>
>> I  also tried the following:
>>  val join_df =  df1.join( df2, df1("country") == df2("country"),
>> "left_outer")
>>
>> I got a compilation error that it is a Boolean whereas a Column is
>> required.
>>
>> So what is the correct Column expression I need to provide for joining
>> the 2
>> dataframes on a specific field ?
>>
>> thanks
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/column-expression-in-left-outer-join-for-DataFrame-tp22209.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>

Reply via email to