I understand the rational, but when you need to reference, for example when
using a join, some column which name is not unique, it can be confusing in
terms of API.
However I figured out that you can use a "qualified" name for the column
using the *other-dataframe.column_name* syntax, maybe we just
Name resolution is not as easy I think. Wenchen can maybe give you some
advice on resolution about this one.
On Sat, May 30, 2015 at 9:37 AM, Yijie Shen
wrote:
> I think just match the Column’s expr as UnresolvedAttribute and use
> UnresolvedAttribute’s name to match schema’s field name is eno
I think just match the Column’s expr as UnresolvedAttribute and use
UnresolvedAttribute’s name to match schema’s field name is enough.
Seems no need to regard expr as a more general one. :)
On May 30, 2015 at 11:14:05 PM, Girardot Olivier
(o.girar...@lateral-thoughts.com) wrote:
Jira done : ht
Jira done : https://issues.apache.org/jira/browse/SPARK-7969
I've already started working on it but it's less trivial than it seems
because I don't exactly now the inner workings of the catalog,
and how to get the qualified name of a column to match it against the
schema/catalog.
Regards,
Olivier
Yea would be great to support a Column. Can you create a JIRA, and possibly
a pull request?
On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:
> Actually, the Scala API too is only based on column name
>
> Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
>
Actually, the Scala API too is only based on column name
Le ven. 29 mai 2015 à 11:23, Olivier Girardot <
o.girar...@lateral-thoughts.com> a écrit :
> Hi,
> Testing a bit more 1.4, it seems that the .drop() method in PySpark
> doesn't seem to accept a Column as input datatype :
>
>
> *.join(on
Hi,
Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't
seem to accept a Column as input datatype :
*.join(only_the_best, only_the_best.pol_no == df.pol_no,
"inner").drop(only_the_best.pol_no)\* File
"/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", li