Re: java.lang.java.lang.UnsupportedOperationException

Nicholas Hakobian Wed, 19 Apr 2017 12:24:21 -0700

CDH 5.5 only provides Spark 1.5. Are you managing your pySpark install
separately?


For something like your example, you will get significantly better
performance using coalesce with a lit, like so:

from pyspark.sql.functions import lit, coalesce

def replace_empty(icol):
    return coalesce(col(icol), lit("")).alias(icol)

and use it similarly to what you are doing (I would build a function around
your if logic, its easier to understand):

def _if_not_in_processing(icol):
    return icol if (icol not in colprocessing) else replace_empty(icol)

dfTotaleNormalize53 = dfTotaleNormalize52.select([_if_not_in_processing(i) for
i in dfTotaleNormalize52.columns])

Otherwise there isn't anything obvious to me as to why it isn't working. If
you actually do have pySpark 1.5 and not 1.6 I know it handles UDF
registration differently.

Hope this helps.


Nicholas Szandor Hakobian, Ph.D.
Senior Data Scientist
Rally Health
nicholas.hakob...@rallyhealth.com


On Wed, Apr 19, 2017 at 5:13 AM, issues solution <issues.solut...@gmail.com>
wrote:

> Pyspark 1.6  On cloudera 5.5  (yearn)
>
> 2017-04-19 13:42 GMT+02:00 issues solution <issues.solut...@gmail.com>:
>
>> Hi ,
>> somone can tell  me why i get the folowing error  with udf apply  like
>> udf
>>
>> def replaceCempty(x):
>>     if x is None :
>>         return ""
>>     else :
>>         return x.encode('utf-8')
>> udf_replaceCempty = F.udf(replaceCempty,StringType())
>>
>> dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not in
>> colprocessing  else  udf_replaceCempty(F.col(i)).alias(i) for i in
>> dfTotaleNormalize52.columns])
>>
>>
>> java.lang.java.lang.UnsupportedOperationException
>>
>>  Cannot evaluate expression: PythonUDF#replaceCempty(input[77,string])
>>
>> ??
>> regards
>>
>>
>>
>>
>

Re: java.lang.java.lang.UnsupportedOperationException

Reply via email to