Thanks great Dhaval. scala> import java.text.SimpleDateFormat import java.text.SimpleDateFormat scala> scala> import java.sql.Date import java.sql.Date scala> scala> import scala.util.{Try, Success, Failure} import scala.util.{Try, Success, Failure} scala> val toDate = udf{(out:String, form: String) => { | val format = new SimpleDateFormat(s"$form"); | Try(new Date(format.parse(out.toString()).getTime))match { | case Success(t) => Some(t) | case Failure(_) => None | }}}; toDate: org.apache.spark.sql.UserDefinedFunction = UserDefinedFunction(<function2>,DateType,List(StringType, StringType))
Register the UDF but comes back with an error. I guess I am not passing parameters properly? scala> sqlContext.udf.register("toDate", toDate(_:String, _:String)) <console>:31: error: type mismatch; found : String required: org.apache.spark.sql.Column sqlContext.udf.register("toDate", toDate(_:String, _:String)) ^ Now I want to use it here val a = df.filter(col("Total") > "").map(p => Invoices(p(0).toString, *p(1).toString,* CleanupCurrency(p(2).toString), CleanupCurrency(p(3).toString), CleanupCurrency(p(4).toString))) for p(1) Does p(1) which is type any need to be converted to String first. like toDate(p(1).toString) etc? Thanks again Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 26 March 2016 at 07:00, Dhaval Modi <dhavalmod...@gmail.com> wrote: > Import below libs: > > import java.text.SimpleDateFormat > import java.sql.Date > import scala.util.{Try, Success, Failure} > > Regards, > Dhaval Modi > dhavalmod...@gmail.com > > On 26 March 2016 at 12:28, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> I am getting this >> >> scala> import org.apache.spark.sql.functions._ >> import org.apache.spark.sql.functions._ >> scala> import java.text.SimpleDateFormat; >> import java.text.SimpleDateFormat >> scala> import java.text.DateFormat; >> import java.text.DateFormat >> scala> import java.text.SimpleDateFormat; >> import java.text.SimpleDateFormat >> scala> import java.text.ParseException; >> import java.text.ParseException >> scala> import java.util.Date; >> import java.util.Date >> scala> val toDate = udf{(out:String, form: String) => { >> | val format = new SimpleDateFormat(s"$form"); >> | Try(new Date(format.parse(out.toString()).getTime))match { >> | case Success(t) => Some(t) >> | case Failure(_) => None >> | }}}; >> <console>:34: error: not found: value Try >> Try(new Date(format.parse(out.toString()).getTime))match { >> ^ >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> >> On 26 March 2016 at 06:44, Dhaval Modi <dhavalmod...@gmail.com> wrote: >> >>> Yes, it is for Scala UDF. >>> >>> Scala does use JAVA functions/classes, so this works fine. >>> >>> Regards, >>> Dhaval Modi >>> dhavalmod...@gmail.com >>> >>> On 26 March 2016 at 12:12, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>>> Thanks very much Dhaval. >>>> >>>> Is the code meant to be for Scala UDF? >>>> >>>> What I had before was >>>> >>>> def ChangeToDate (word : String) : Date = { >>>> return >>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(word,"dd/MM/yyyy"),"yyyy-MM-dd")) >>>> } >>>> >>>> Which did not work >>>> >>>> Regards, >>>> >>>> >>>> Dr Mich Talebzadeh >>>> >>>> >>>> >>>> LinkedIn * >>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>> >>>> >>>> >>>> http://talebzadehmich.wordpress.com >>>> >>>> >>>> >>>> On 26 March 2016 at 06:07, Dhaval Modi <dhavalmod...@gmail.com> wrote: >>>> >>>>> Hi Mich, >>>>> >>>>> You can try this: >>>>> >>>>> val toDate = udf{(out:String, form: String) => { >>>>> val format = new SimpleDateFormat(s"$form"); >>>>> Try(new Date(format.parse(out.toString()).getTime))match { >>>>> case Success(t) => Some(t) >>>>> case Failure(_) => None >>>>> }}}; >>>>> >>>>> Usage: src = src.withColumn(s"$columnName", >>>>> toDate(src(s"$columnName"), lit(s"dd/MM/yyyy"))) >>>>> >>>>> >>>>> Regards, >>>>> Dhaval Modi >>>>> dhavalmod...@gmail.com >>>>> >>>>> On 26 March 2016 at 04:34, Mich Talebzadeh <mich.talebza...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Ted, >>>>>> >>>>>> I decided to take a short cut here. I created the map leaving date as >>>>>> it is (p(1)) below >>>>>> >>>>>> def CleanupCurrency (word : String) : Double = { >>>>>> return word.toString.substring(1).replace(",", "").toDouble >>>>>> } >>>>>> sqlContext.udf.register("CleanupCurrency", CleanupCurrency(_:String)) >>>>>> val a = df.filter(col("Total") > "").map(p => Invoices(p(0).toString, >>>>>> p(1).toString, CleanupCurrency(p(2).toString), >>>>>> CleanupCurrency(p(3).toString), CleanupCurrency(p(4).toString))) >>>>>> >>>>>> // >>>>>> // convert this RDD to DF and create a Spark temporary table >>>>>> // >>>>>> a.toDF.registerTempTable("tmp") >>>>>> >>>>>> INSERT INTO TABLE <HIVE_TABLE> >>>>>> SELECT >>>>>> INVOICENUMBER >>>>>> , >>>>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/yyyy'),'yyyy-MM-dd')) >>>>>> AS paymentdate >>>>>> , NET >>>>>> , VAT >>>>>> , TOTAL >>>>>> FROM tmp >>>>>> """ >>>>>> sql(sqltext) >>>>>> >>>>>> That works OK. >>>>>> >>>>>> If I want to find invoices with paymentdate > 6 months old I do >>>>>> >>>>>> sql("SELECT invoicenumber, paymentdate FROM test.t14 *WHERE >>>>>> months_between(FROM_unixtime(unix_timestamp(), 'yyyy-MM-dd'), >>>>>> paymentdate)* >>>>>> > 6 ORDER BY invoicenumber, paymentdate").collect.foreach(println) >>>>>> [360,2014-02-10] >>>>>> [361,2014-02-17] >>>>>> >>>>>> I still interested if I could do it using a UDF :) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Dr Mich Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> LinkedIn * >>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>> >>>>>> >>>>>> >>>>>> http://talebzadehmich.wordpress.com >>>>>> >>>>>> >>>>>> >>>>>> On 25 March 2016 at 17:44, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> >>>>>>> Mich: >>>>>>> Please take a look at: >>>>>>> sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala >>>>>>> >>>>>>> test("function to_date") { >>>>>>> >>>>>>> Remember to: >>>>>>> import org.apache.spark.sql.functions._ >>>>>>> >>>>>>> On Fri, Mar 25, 2016 at 7:59 AM, Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> This works with sql >>>>>>>> >>>>>>>> sqltext = """ >>>>>>>> INSERT INTO TABLE t14 >>>>>>>> SELECT >>>>>>>> INVOICENUMBER >>>>>>>> , >>>>>>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/yyyy'),'yyyy-MM-dd')) >>>>>>>> AS paymentdate >>>>>>>> , NET >>>>>>>> , VAT >>>>>>>> , TOTAL >>>>>>>> FROM tmp >>>>>>>> """ >>>>>>>> sql(sqltext) >>>>>>>> >>>>>>>> >>>>>>>> but not in UDF. I want to convert it to correct date format >>>>>>>> before writing it to table >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Dr Mich Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> LinkedIn * >>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 25 March 2016 at 14:54, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Do you mind showing body of TO_DATE() ? >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> On Fri, Mar 25, 2016 at 7:38 AM, Ted Yu <yuzhih...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Looks like you forgot an import for Date. >>>>>>>>>> >>>>>>>>>> FYI >>>>>>>>>> >>>>>>>>>> On Fri, Mar 25, 2016 at 7:36 AM, Mich Talebzadeh < >>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> writing a UDF to convert a string into Date >>>>>>>>>>> >>>>>>>>>>> def ChangeDate(word : String) : Date = { >>>>>>>>>>> | return >>>>>>>>>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(word),"dd/MM/yyyy"),"yyyy-MM-dd") >>>>>>>>>>> | } >>>>>>>>>>> <console>:19: error: not found: type Date >>>>>>>>>>> >>>>>>>>>>> That code to_date.. works OK in sql but not here. It is >>>>>>>>>>> complaining about to_date? >>>>>>>>>>> >>>>>>>>>>> Any ideas will be appreciated. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> LinkedIn * >>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >