Thank you both! Here's the code that's working now. It's a bit hard to read due to so many functions. Any idea how I can improve the readability?
from pyspark.sql.functions import trim, when, from_unixtime, unix_timestamp, minute, hour duration_test = flight2.select("stop_duration1") duration_test.show() duration_test.withColumn('duration_h', when(duration_test.stop_duration1.isNull(), 999) .otherwise(hour(unix_timestamp(duration_test.stop_duration1,"HH'h'mm'm'").cast("timestamp")))).show(20) +--------------+ |stop_duration1| +--------------+ | 0h50m| | 3h15m| | 8h35m| | 1h30m| | 12h15m| | 11h50m| | 2h5m| | 10h25m| | 8h20m| | null| | 2h50m| | 2h30m| | 7h45m| | 1h10m| | 2h15m| | 2h0m| | 10h25m| | 1h40m| | 1h55m| | 1h40m| +--------------+ only showing top 20 rows +--------------+----------+ |stop_duration1|duration_h| +--------------+----------+ | 0h50m| 0| | 3h15m| 3| | 8h35m| 8| | 1h30m| 1| | 12h15m| 12| | 11h50m| 11| | 2h5m| 2| | 10h25m| 10| | 8h20m| 8| | null| 999| | 2h50m| 2| | 2h30m| 2| | 7h45m| 7| | 1h10m| 1| | 2h15m| 2| | 2h0m| 2| | 10h25m| 10| | 1h40m| 1| | 1h55m| 1| | 1h40m| 1| +--------------+----------+ only showing top 20 rows On Tue, Apr 25, 2017 at 11:29 AM, Pushkar.Gujar <pushkarvgu...@gmail.com> wrote: > Someone had similar issue today at stackoverflow. > > http://stackoverflow.com/questions/43595201/python-how- > to-convert-pyspark-column-to-date-type-if-there-are-null- > values/43595728#43595728 > > > Thank you, > *Pushkar Gujar* > > > On Mon, Apr 24, 2017 at 8:22 PM, Zeming Yu <zemin...@gmail.com> wrote: > >> hi all, >> >> I tried to write a UDF that handles null values: >> >> def getMinutes(hString, minString): >> if (hString != None) & (minString != None): return int(hString) * 60 >> + int(minString[:-1]) >> else: return None >> >> flight2 = (flight2.withColumn("duration_minutes", >> udfGetMinutes("duration_h", "duration_m"))) >> >> >> but I got this error: >> >> File "<ipython-input-67-5eb2daa1c1f2>", line 6, in getMinutes >> TypeError: int() argument must be a string, a bytes-like object or a number, >> not 'NoneType' >> >> >> Does anyone know how to do this? >> >> >> Thanks, >> >> Zeming >> >> >