val df = Seq(("test", 1)).toDF("col1", "col2")

You can use SQL style expressions as a string:

df.filter("col1 IS NOT NULL").collect()
res1: Array[org.apache.spark.sql.Row] = Array([test,1])

Or you can also reference columns using df("colName") or quot;colName" or
col("colName")

df.filter(df("col1") === "test").collect()
res2: Array[org.apache.spark.sql.Row] = Array([test,1])

On Thu, Apr 2, 2015 at 7:45 PM, Yana Kadiyska <[email protected]>
wrote:

> Hi folks, having some seemingly noob issues with the dataframe API.
>
> I have a DF which came from the csv package.
>
> 1. What would be an easy way to cast a column to a given type -- my DF
> columns are all typed as strings coming from a csv. I see a schema getter
> but not setter on DF
>
> 2. I am trying to use the syntax used in various blog posts but can't
> figure out how to reference a column by name:
>
> scala> df.filter("customer_id"!="")
> <console>:23: error: overloaded method value filter with alternatives:
>   (conditionExpr: String)org.apache.spark.sql.DataFrame <and>
>   (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
>  cannot be applied to (Boolean)
>               df.filter("customer_id"!="")
>
> ​
> 3. what would be the recommended way to drop a row containing a null value
> -- is it possible to do this:
> scala> df.filter("customer_id" IS NOT NULL)
>
>
>
>

Reply via email to