Re: [SQL] Simple DataFrame questions

Yin Huai Thu, 02 Apr 2015 20:40:13 -0700

For cast, you can use selectExpr method. For example,
df.selectExpr("cast(col1 as int) as col1", "cast(col2 as bigint) as col2").
Or, df.select(df("colA").cast("int"), ...)


On Thu, Apr 2, 2015 at 8:33 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> val df = Seq(("test", 1)).toDF("col1", "col2")
>
> You can use SQL style expressions as a string:
>
> df.filter("col1 IS NOT NULL").collect()
> res1: Array[org.apache.spark.sql.Row] = Array([test,1])
>
> Or you can also reference columns using df("colName") or quot;colName" or
> col("colName")
>
> df.filter(df("col1") === "test").collect()
> res2: Array[org.apache.spark.sql.Row] = Array([test,1])
>
> On Thu, Apr 2, 2015 at 7:45 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
> wrote:
>
>> Hi folks, having some seemingly noob issues with the dataframe API.
>>
>> I have a DF which came from the csv package.
>>
>> 1. What would be an easy way to cast a column to a given type -- my DF
>> columns are all typed as strings coming from a csv. I see a schema getter
>> but not setter on DF
>>
>> 2. I am trying to use the syntax used in various blog posts but can't
>> figure out how to reference a column by name:
>>
>> scala> df.filter("customer_id"!="")
>> <console>:23: error: overloaded method value filter with alternatives:
>>   (conditionExpr: String)org.apache.spark.sql.DataFrame <and>
>>   (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
>>  cannot be applied to (Boolean)
>>               df.filter("customer_id"!="")
>>
>> 
>> 3. what would be the recommended way to drop a row containing a null
>> value -- is it possible to do this:
>> scala> df.filter("customer_id" IS NOT NULL)
>>
>>
>>
>>
>

Re: [SQL] Simple DataFrame questions

Reply via email to