Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-31 Thread Olivier Girardot
I understand the rational, but when you need to reference, for example when using a join, some column which name is not unique, it can be confusing in terms of API. However I figured out that you can use a "qualified" name for the column using the *other-dataframe.column_name* syntax, maybe we just

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-30 Thread Reynold Xin
Name resolution is not as easy I think. Wenchen can maybe give you some advice on resolution about this one. On Sat, May 30, 2015 at 9:37 AM, Yijie Shen wrote: > I think just match the Column’s expr as UnresolvedAttribute and use > UnresolvedAttribute’s name to match schema’s field name is eno

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-30 Thread Yijie Shen
I think just match the Column’s expr as UnresolvedAttribute and use UnresolvedAttribute’s name to match schema’s field name is enough. Seems no need to regard expr as a more general one. :) On May 30, 2015 at 11:14:05 PM, Girardot Olivier (o.girar...@lateral-thoughts.com) wrote: Jira done : ht

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-30 Thread Olivier Girardot
Jira done : https://issues.apache.org/jira/browse/SPARK-7969 I've already started working on it but it's less trivial than it seems because I don't exactly now the inner workings of the catalog, and how to get the qualified name of a column to match it against the schema/catalog. Regards, Olivier

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-30 Thread Reynold Xin
Yea would be great to support a Column. Can you create a JIRA, and possibly a pull request? On Fri, May 29, 2015 at 2:45 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Actually, the Scala API too is only based on column name > > Le ven. 29 mai 2015 à 11:23, Olivier Girardot < >

Re: Dataframe's .drop in PySpark doesn't accept Column

2015-05-29 Thread Olivier Girardot
Actually, the Scala API too is only based on column name Le ven. 29 mai 2015 à 11:23, Olivier Girardot < o.girar...@lateral-thoughts.com> a écrit : > Hi, > Testing a bit more 1.4, it seems that the .drop() method in PySpark > doesn't seem to accept a Column as input datatype : > > > *.join(on

Dataframe's .drop in PySpark doesn't accept Column

2015-05-29 Thread Olivier Girardot
Hi, Testing a bit more 1.4, it seems that the .drop() method in PySpark doesn't seem to accept a Column as input datatype : *.join(only_the_best, only_the_best.pol_no == df.pol_no, "inner").drop(only_the_best.pol_no)\* File "/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.py", li