Re: Check for null in PySpark DataFrame

2015-07-02 Thread Pedro Rodriguez
Thanks for the tip. Any idea why the intuitive answer doesn't work ( != None)? I inspected the Row columns and they do indeed have a None value. I would suspect that somehow Python's None is translated to something in jvm which doesn't equal to null? I might check out the source code for a better

Re: Check for null in PySpark DataFrame

2015-07-01 Thread Michael Armbrust
There is an isNotNull function on any column. df._1.isNotNull or from pyspark.sql.functions import * col("myColumn").isNotNull On Wed, Jul 1, 2015 at 3:07 AM, Olivier Girardot wrote: > I must admit I've been using the same "back to SQL" strategy for now :p > So I'd be glad to have insights in

Re: Check for null in PySpark DataFrame

2015-07-01 Thread Olivier Girardot
I must admit I've been using the same "back to SQL" strategy for now :p So I'd be glad to have insights into that too. Le mar. 30 juin 2015 à 23:28, pedro a écrit : > I am trying to find what is the correct way to programmatically check for > null values for rows in a dataframe. For example, bel