It seems that it doesn't happen in Scala API. Not exactly the same as in python, but pretty close.
https://gist.github.com/elnopintan/675968d2e4be68958df8 2015-06-23 23:11 GMT+02:00 Davies Liu <dav...@databricks.com>: > I think it also happens in DataFrames API of all languages. > > On Tue, Jun 23, 2015 at 9:16 AM, Ignacio Blasco <elnopin...@gmail.com> > wrote: > > That issue happens only in python dsl? > > > > El 23/6/2015 5:05 p. m., "Bob Corsaro" <rcors...@gmail.com> escribió: > >> > >> Thanks! The solution: > >> > >> https://gist.github.com/dokipen/018a1deeab668efdf455 > >> > >> On Mon, Jun 22, 2015 at 4:33 PM Davies Liu <dav...@databricks.com> > wrote: > >>> > >>> Right now, we can not figure out which column you referenced in > >>> `select`, if there are multiple row with the same name in the joined > >>> DataFrame (for example, two `value`). > >>> > >>> A workaround could be: > >>> > >>> numbers2 = numbers.select(df.name, df.value.alias('other')) > >>> rows = numbers.join(numbers2, > >>> (numbers.name==numbers2.name) & (numbers.value != > >>> numbers2.other), > >>> how="inner") \ > >>> .select(numbers.name, numbers.value, numbers2.other) \ > >>> .collect() > >>> > >>> On Mon, Jun 22, 2015 at 12:53 PM, Ignacio Blasco <elnopin...@gmail.com > > > >>> wrote: > >>> > Sorry thought it was scala/spark > >>> > > >>> > El 22/6/2015 9:49 p. m., "Bob Corsaro" <rcors...@gmail.com> > escribió: > >>> >> > >>> >> That's invalid syntax. I'm pretty sure pyspark is using a DSL to > >>> >> create a > >>> >> query here and not actually doing an equality operation. > >>> >> > >>> >> On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco < > elnopin...@gmail.com> > >>> >> wrote: > >>> >>> > >>> >>> Probably you should use === instead of == and !== instead of != > >>> >>> > >>> >>> Can anyone explain why the dataframe API doesn't work as I expect > it > >>> >>> to > >>> >>> here? It seems like the column identifiers are getting confused. > >>> >>> > >>> >>> https://gist.github.com/dokipen/4b324a7365ae87b7b0e5 >