Re: Dataframe.fillna from 1.3.0

Reynold Xin Thu, 23 Apr 2015 15:10:43 -0700

You can do it similar to the way countDistinct is done, can't you?

https://github.com/apache/spark/blob/master/python/pyspark/sql/functions.py#L78




On Thu, Apr 23, 2015 at 1:59 PM, Olivier Girardot <
o.girar...@lateral-thoughts.com> wrote:

> I found another way setting a SPARK_HOME on a released version and
> launching an ipython to load the contexts.
> I may need your insight however, I found why it hasn't been done at the
> same time, this method (like some others) uses a varargs in Scala and for
> now the way functions are called only one parameter is supported.
>
> So at first I tried to just generalise the helper function "_" in the
> functions.py file to multiple arguments, but py4j's handling of varargs
> forces me to create an Array[Column] if the target method is expecting
> varargs.
>
> But from Python's perspective, we have no idea of whether the target
> method will be expecting varargs or just multiple arguments (to un-tuple).
> I can create a special case for "coalesce" or "for method that takes of
> list of columns as arguments" considering they will be varargs based (and
> therefore needs an Array[Column] instead of just a list of arguments)
>
> But this seems very specific and very prone to future mistakes.
> Is there any way in Py4j to know before calling it the signature of a
> method ?
>
>
> Le jeu. 23 avr. 2015 à 22:17, Olivier Girardot <
> o.girar...@lateral-thoughts.com> a écrit :
>
>> What is the way of testing/building the pyspark part of Spark ?
>>
>> Le jeu. 23 avr. 2015 à 22:06, Olivier Girardot <
>> o.girar...@lateral-thoughts.com> a écrit :
>>
>>> yep :) I'll open the jira when I've got the time.
>>> Thanks
>>>
>>> Le jeu. 23 avr. 2015 à 19:31, Reynold Xin <r...@databricks.com> a
>>> écrit :
>>>
>>>> Ah damn. We need to add it to the Python list. Would you like to give
>>>> it a shot?
>>>>
>>>>
>>>> On Thu, Apr 23, 2015 at 4:31 AM, Olivier Girardot <
>>>> o.girar...@lateral-thoughts.com> wrote:
>>>>
>>>>> Yep no problem, but I can't seem to find the coalesce fonction in
>>>>> pyspark.sql.{*, functions, types or whatever :) }
>>>>>
>>>>> Olivier.
>>>>>
>>>>> Le lun. 20 avr. 2015 à 11:48, Olivier Girardot <
>>>>> o.girar...@lateral-thoughts.com> a écrit :
>>>>>
>>>>> > a UDF might be a good idea no ?
>>>>> >
>>>>> > Le lun. 20 avr. 2015 à 11:17, Olivier Girardot <
>>>>> > o.girar...@lateral-thoughts.com> a écrit :
>>>>> >
>>>>> >> Hi everyone,
>>>>> >> let's assume I'm stuck in 1.3.0, how can I benefit from the
>>>>> *fillna* API
>>>>> >> in PySpark, is there any efficient alternative to mapping the
>>>>> records
>>>>> >> myself ?
>>>>> >>
>>>>> >> Regards,
>>>>> >>
>>>>> >> Olivier.
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>

Re: Dataframe.fillna from 1.3.0

Reply via email to