CDH 5.5 only provides Spark 1.5. Are you managing your pySpark install separately?
For something like your example, you will get significantly better performance using coalesce with a lit, like so: from pyspark.sql.functions import lit, coalesce def replace_empty(icol): return coalesce(col(icol), lit("")).alias(icol) and use it similarly to what you are doing (I would build a function around your if logic, its easier to understand): def _if_not_in_processing(icol): return icol if (icol not in colprocessing) else replace_empty(icol) dfTotaleNormalize53 = dfTotaleNormalize52.select([_if_not_in_processing(i) for i in dfTotaleNormalize52.columns]) Otherwise there isn't anything obvious to me as to why it isn't working. If you actually do have pySpark 1.5 and not 1.6 I know it handles UDF registration differently. Hope this helps. Nicholas Szandor Hakobian, Ph.D. Senior Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Wed, Apr 19, 2017 at 5:13 AM, issues solution <issues.solut...@gmail.com> wrote: > Pyspark 1.6 On cloudera 5.5 (yearn) > > 2017-04-19 13:42 GMT+02:00 issues solution <issues.solut...@gmail.com>: > >> Hi , >> somone can tell me why i get the folowing error with udf apply like >> udf >> >> def replaceCempty(x): >> if x is None : >> return "" >> else : >> return x.encode('utf-8') >> udf_replaceCempty = F.udf(replaceCempty,StringType()) >> >> dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not in >> colprocessing else udf_replaceCempty(F.col(i)).alias(i) for i in >> dfTotaleNormalize52.columns]) >> >> >> java.lang.java.lang.UnsupportedOperationException >> >> Cannot evaluate expression: PythonUDF#replaceCempty(input[77,string]) >> >> ?? >> regards >> >> >> >> >