For cast, you can use selectExpr method. For example, df.selectExpr("cast(col1 as int) as col1", "cast(col2 as bigint) as col2"). Or, df.select(df("colA").cast("int"), ...)
On Thu, Apr 2, 2015 at 8:33 PM, Michael Armbrust <mich...@databricks.com> wrote: > val df = Seq(("test", 1)).toDF("col1", "col2") > > You can use SQL style expressions as a string: > > df.filter("col1 IS NOT NULL").collect() > res1: Array[org.apache.spark.sql.Row] = Array([test,1]) > > Or you can also reference columns using df("colName") or quot;colName" or > col("colName") > > df.filter(df("col1") === "test").collect() > res2: Array[org.apache.spark.sql.Row] = Array([test,1]) > > On Thu, Apr 2, 2015 at 7:45 PM, Yana Kadiyska <yana.kadiy...@gmail.com> > wrote: > >> Hi folks, having some seemingly noob issues with the dataframe API. >> >> I have a DF which came from the csv package. >> >> 1. What would be an easy way to cast a column to a given type -- my DF >> columns are all typed as strings coming from a csv. I see a schema getter >> but not setter on DF >> >> 2. I am trying to use the syntax used in various blog posts but can't >> figure out how to reference a column by name: >> >> scala> df.filter("customer_id"!="") >> <console>:23: error: overloaded method value filter with alternatives: >> (conditionExpr: String)org.apache.spark.sql.DataFrame <and> >> (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame >> cannot be applied to (Boolean) >> df.filter("customer_id"!="") >> >> >> 3. what would be the recommended way to drop a row containing a null >> value -- is it possible to do this: >> scala> df.filter("customer_id" IS NOT NULL) >> >> >> >> >