Ok
I have created a one liner csv file as follows: cat testme.csv 360,10/02/2014,"?2,500.00",?0.00,"?2,500.00" I use the following in Spark to split it csv=sc.textFile("/data/incoming/testme.csv") csv.map(_.split(",")).first res159: Array[String] = Array(360, 10/02/2014, "?2, 500.00", ?0.00, "?2, 500.00") That comes back with an array Now all I want is to get rid of “?” and “,” in above. The problem is I have a currency field “?2,500.00” that has got an additional “,” as well that messes up things replaceAll() does not work Any other alternatives? Thanks, Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility. From: Andrew Ehrlich [mailto:and...@aehrlich.com] Sent: 19 February 2016 01:22 To: Mich Talebzadeh <m...@peridale.co.uk> Cc: User <user@spark.apache.org> Subject: Re: Hive REGEXP_REPLACE use or equivalent in Spark Use the scala method .split(",") to split the string into a collection of strings, and try using .replaceAll() on the field with the "?" to remove it. On Thu, Feb 18, 2016 at 2:09 PM, Mich Talebzadeh <m...@peridale.co.uk <mailto:m...@peridale.co.uk> > wrote: Hi, What is the equivalent of this Hive statement in Spark select "?2,500.00", REGEXP_REPLACE("?2,500.00",'[^\\d\\.]',''); +------------+----------+--+ | _c0 | _c1 | +------------+----------+--+ | ?2,500.00 | 2500.00 | +------------+----------+--+ Basically I want to get rid of "?" and "," in the csv file The full csv line is scala> csv2.first res94: String = 360,10/02/2014,"?2,500.00",?0.00,"?2,500.00" I want to transform that string into 5 columns and use "," as the split Thanks, Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Technology Ltd, its subsidiaries nor their employees accept any responsibility.