Re: SQL vs. DataFrame API

Ignacio Blasco Tue, 23 Jun 2015 14:55:48 -0700

It seems that it doesn't happen in Scala API. Not exactly the same as in
python, but pretty close.


https://gist.github.com/elnopintan/675968d2e4be68958df8

2015-06-23 23:11 GMT+02:00 Davies Liu <dav...@databricks.com>:

> I think it also happens in DataFrames API of all languages.
>
> On Tue, Jun 23, 2015 at 9:16 AM, Ignacio Blasco <elnopin...@gmail.com>
> wrote:
> > That issue happens only in python dsl?
> >
> > El 23/6/2015 5:05 p. m., "Bob Corsaro" <rcors...@gmail.com> escribió:
> >>
> >> Thanks! The solution:
> >>
> >> https://gist.github.com/dokipen/018a1deeab668efdf455
> >>
> >> On Mon, Jun 22, 2015 at 4:33 PM Davies Liu <dav...@databricks.com>
> wrote:
> >>>
> >>> Right now, we can not figure out which column you referenced in
> >>> `select`, if there are multiple row with the same name in the joined
> >>> DataFrame (for example, two `value`).
> >>>
> >>> A workaround could be:
> >>>
> >>> numbers2 = numbers.select(df.name, df.value.alias('other'))
> >>> rows = numbers.join(numbers2,
> >>>                     (numbers.name==numbers2.name) & (numbers.value !=
> >>> numbers2.other),
> >>>                     how="inner") \
> >>>               .select(numbers.name, numbers.value, numbers2.other) \
> >>>               .collect()
> >>>
> >>> On Mon, Jun 22, 2015 at 12:53 PM, Ignacio Blasco <elnopin...@gmail.com
> >
> >>> wrote:
> >>> > Sorry thought it was scala/spark
> >>> >
> >>> > El 22/6/2015 9:49 p. m., "Bob Corsaro" <rcors...@gmail.com>
> escribió:
> >>> >>
> >>> >> That's invalid syntax. I'm pretty sure pyspark is using a DSL to
> >>> >> create a
> >>> >> query here and not actually doing an equality operation.
> >>> >>
> >>> >> On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco <
> elnopin...@gmail.com>
> >>> >> wrote:
> >>> >>>
> >>> >>> Probably you should use === instead of == and !== instead of !=
> >>> >>>
> >>> >>> Can anyone explain why the dataframe API doesn't work as I expect
> it
> >>> >>> to
> >>> >>> here? It seems like the column identifiers are getting confused.
> >>> >>>
> >>> >>> https://gist.github.com/dokipen/4b324a7365ae87b7b0e5
>

Re: SQL vs. DataFrame API

Reply via email to