val df = Seq(("test", 1)).toDF("col1", "col2")
You can use SQL style expressions as a string:
df.filter("col1 IS NOT NULL").collect()
res1: Array[org.apache.spark.sql.Row] = Array([test,1])
Or you can also reference columns using df("colName") or quot;colName" or
col("colName")
df.filter(df("col1") === "test").collect()
res2: Array[org.apache.spark.sql.Row] = Array([test,1])
On Thu, Apr 2, 2015 at 7:45 PM, Yana Kadiyska <[email protected]>
wrote:
> Hi folks, having some seemingly noob issues with the dataframe API.
>
> I have a DF which came from the csv package.
>
> 1. What would be an easy way to cast a column to a given type -- my DF
> columns are all typed as strings coming from a csv. I see a schema getter
> but not setter on DF
>
> 2. I am trying to use the syntax used in various blog posts but can't
> figure out how to reference a column by name:
>
> scala> df.filter("customer_id"!="")
> <console>:23: error: overloaded method value filter with alternatives:
> (conditionExpr: String)org.apache.spark.sql.DataFrame <and>
> (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
> cannot be applied to (Boolean)
> df.filter("customer_id"!="")
>
>
> 3. what would be the recommended way to drop a row containing a null value
> -- is it possible to do this:
> scala> df.filter("customer_id" IS NOT NULL)
>
>
>
>