Congrats! You made it. A serious Spark dev badge unlocked :) Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski
On Tue, Aug 2, 2016 at 9:58 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > it should be lit(0) :) > > rs.select(mySubstr($"transactiondescription", lit(0), > instr($"transactiondescription", "CD"))).show(1) > +--------------------------------------------------------------+ > |UDF(transactiondescription,0,instr(transactiondescription,CD))| > +--------------------------------------------------------------+ > | OVERSEAS TRANSACTI C| > +--------------------------------------------------------------+ > > > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > Disclaimer: Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > > > On 2 August 2016 at 08:52, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: >> >> No thinking on my part!!! >> >> rs.select(mySubstr($"transactiondescription", lit(1), >> instr($"transactiondescription", "CD"))).show(2) >> +--------------------------------------------------------------+ >> |UDF(transactiondescription,1,instr(transactiondescription,CD))| >> +--------------------------------------------------------------+ >> | VERSEAS TRANSACTI C| >> | XYZ.COM 80...| >> +--------------------------------------------------------------+ >> only showing top 2 rows >> >> Let me test it. >> >> Cheers >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >> >> >> On 1 August 2016 at 23:43, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >>> >>> Thanks Jacek. >>> >>> It sounds like the issue the position of the second variable in >>> substring() >>> >>> This works >>> >>> scala> val wSpec2 = >>> Window.partitionBy(substring($"transactiondescription",1,20)) >>> wSpec2: org.apache.spark.sql.expressions.WindowSpec = >>> org.apache.spark.sql.expressions.WindowSpec@1a4eae2 >>> >>> Using udf as suggested >>> >>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) => >>> | s.substring(start, end) } >>> mySubstr: org.apache.spark.sql.UserDefinedFunction = >>> UserDefinedFunction(<function3>,StringType,List(StringType, IntegerType, >>> IntegerType)) >>> >>> >>> This was throwing error: >>> >>> val wSpec2 = >>> Window.partitionBy(substring("transactiondescription",1,indexOf("transactiondescription",'CD')-2)) >>> >>> >>> So I tried using udf >>> >>> scala> val wSpec2 = >>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1), >>> instr('s, "CD"))) >>> | ) >>> <console>:28: error: value select is not a member of >>> org.apache.spark.sql.ColumnName >>> val wSpec2 = >>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1), >>> instr('s, "CD"))) >>> >>> Obviously I am not doing correctly :( >>> >>> cheers >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> Disclaimer: Use it at your own risk. Any and all responsibility for any >>> loss, damage or destruction of data or any other property which may arise >>> from relying on this email's technical content is explicitly disclaimed. The >>> author will in no case be liable for any monetary damages arising from such >>> loss, damage or destruction. >>> >>> >>> >>> >>> On 1 August 2016 at 23:02, Jacek Laskowski <ja...@japila.pl> wrote: >>>> >>>> Hi, >>>> >>>> Interesting... >>>> >>>> I'm temping to think that substring function should accept the columns >>>> that hold the numbers for start and end. I'd love hearing people's >>>> thought on this. >>>> >>>> For now, I'd say you need to define udf to do substring as follows: >>>> >>>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) => >>>> s.substring(start, end) } >>>> mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction = >>>> UserDefinedFunction(<function3>,StringType,Some(List(StringType, >>>> IntegerType, IntegerType))) >>>> >>>> scala> df.show >>>> +-----------+ >>>> | s| >>>> +-----------+ >>>> |hello world| >>>> +-----------+ >>>> >>>> scala> df.select(mySubstr('s, lit(1), instr('s, "ll"))).show >>>> +-----------------------+ >>>> |UDF(s, 1, instr(s, ll))| >>>> +-----------------------+ >>>> | el| >>>> +-----------------------+ >>>> >>>> Pozdrawiam, >>>> Jacek Laskowski >>>> ---- >>>> https://medium.com/@jaceklaskowski/ >>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>>> Follow me at https://twitter.com/jaceklaskowski >>>> >>>> >>>> On Mon, Aug 1, 2016 at 11:18 PM, Mich Talebzadeh >>>> <mich.talebza...@gmail.com> wrote: >>>> > Thanks Jacek, >>>> > >>>> > Do I have any other way of writing this with functional programming? >>>> > >>>> > select >>>> > >>>> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2), >>>> > >>>> > >>>> > Cheers, >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Dr Mich Talebzadeh >>>> > >>>> > >>>> > >>>> > LinkedIn >>>> > >>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> > >>>> > >>>> > >>>> > http://talebzadehmich.wordpress.com >>>> > >>>> > >>>> > Disclaimer: Use it at your own risk. Any and all responsibility for >>>> > any >>>> > loss, damage or destruction of data or any other property which may >>>> > arise >>>> > from relying on this email's technical content is explicitly >>>> > disclaimed. The >>>> > author will in no case be liable for any monetary damages arising from >>>> > such >>>> > loss, damage or destruction. >>>> > >>>> > >>>> > >>>> > >>>> > On 1 August 2016 at 22:13, Jacek Laskowski <ja...@japila.pl> wrote: >>>> >> >>>> >> Hi Mich, >>>> >> >>>> >> There's no indexOf UDF - >>>> >> >>>> >> >>>> >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ >>>> >> >>>> >> >>>> >> Pozdrawiam, >>>> >> Jacek Laskowski >>>> >> ---- >>>> >> https://medium.com/@jaceklaskowski/ >>>> >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>>> >> Follow me at https://twitter.com/jaceklaskowski >>>> >> >>>> >> >>>> >> On Mon, Aug 1, 2016 at 7:24 PM, Mich Talebzadeh >>>> >> <mich.talebza...@gmail.com> wrote: >>>> >> > Hi, >>>> >> > >>>> >> > What is the equivalent of FP for the following window/analytic that >>>> >> > works OK >>>> >> > in Spark SQL >>>> >> > >>>> >> > This one using INSTR >>>> >> > >>>> >> > select >>>> >> > >>>> >> > >>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2), >>>> >> > >>>> >> > >>>> >> > select distinct * >>>> >> > from ( >>>> >> > select >>>> >> > >>>> >> > >>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2), >>>> >> > SUM(debitamount) OVER (PARTITION BY >>>> >> > >>>> >> > >>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2)) >>>> >> > AS >>>> >> > spent >>>> >> > from accounts.ll_18740868 where transactiontype = 'DEB' >>>> >> > ) tmp >>>> >> > >>>> >> > >>>> >> > I tried indexOf but it does not work! >>>> >> > >>>> >> > val wSpec2 = >>>> >> > >>>> >> > >>>> >> > Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD"))) >>>> >> > <console>:26: error: not found: value indexOf >>>> >> > val wSpec2 = >>>> >> > >>>> >> > >>>> >> > Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD"))) >>>> >> > >>>> >> > >>>> >> > Thanks >>>> >> > >>>> >> > Dr Mich Talebzadeh >>>> >> > >>>> >> > >>>> >> > >>>> >> > LinkedIn >>>> >> > >>>> >> > >>>> >> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> >> > >>>> >> > >>>> >> > >>>> >> > http://talebzadehmich.wordpress.com >>>> >> > >>>> >> > >>>> >> > Disclaimer: Use it at your own risk. Any and all responsibility for >>>> >> > any >>>> >> > loss, damage or destruction of data or any other property which may >>>> >> > arise >>>> >> > from relying on this email's technical content is explicitly >>>> >> > disclaimed. >>>> >> > The >>>> >> > author will in no case be liable for any monetary damages arising >>>> >> > from >>>> >> > such >>>> >> > loss, damage or destruction. >>>> >> > >>>> >> > >>>> > >>>> > >>> >>> >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org