Hi folks, having some seemingly noob issues with the dataframe API.
I have a DF which came from the csv package.
1. What would be an easy way to cast a column to a given type -- my DF
columns are all typed as strings coming from a csv. I see a schema getter
but not setter on DF
2. I am trying to use the syntax used in various blog posts but can't
figure out how to reference a column by name:
scala> df.filter("customer_id"!="")
<console>:23: error: overloaded method value filter with alternatives:
(conditionExpr: String)org.apache.spark.sql.DataFrame <and>
(condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
cannot be applied to (Boolean)
df.filter("customer_id"!="")
3. what would be the recommended way to drop a row containing a null value
-- is it possible to do this:
scala> df.filter("customer_id" IS NOT NULL)