If I am understanding your problem correctly, I think you can just create a new DataFrame that is a transformation of sample_data by first registering sample_data as a temp table.
//Register temp table sample_data.createOrReplaceTempView("sql_sample_data") //Create new DataSet with transformed values val transformed = spark.sql("select trim(field1) as field1, trim(field2) as field2...... from sql_sample_data") //Test transformed.show(10) I hope that helps! Subhash On Wed, Mar 1, 2017 at 12:04 PM, Marco Mistroni <mmistr...@gmail.com> wrote: > Hi I think u need an UDF if u want to transform a column.... > Hth > > On 1 Mar 2017 4:22 pm, "Bill Schwanitz" <bil...@bilsch.org> wrote: > >> Hi all, >> >> I'm fairly new to spark and scala so bear with me. >> >> I'm working with a dataset containing a set of column / fields. The data >> is stored in hdfs as parquet and is sourced from a postgres box so fields >> and values are reasonably well formed. We are in the process of trying out >> a switch from pentaho and various sql databases to pulling data into hdfs >> and applying transforms / new datasets with processing being done in spark >> ( and other tools - evaluation ) >> >> A rough version of the code I'm running so far: >> >> val sample_data = spark.read.parquet("my_data_input") >> >> val example_row = spark.sql("select * from parquet.my_data_input where id >> = 123").head >> >> I want to apply a trim operation on a set of fields - lets call them >> field1, field2, field3 and field4. >> >> What is the best way to go about applying those trims and creating a new >> dataset? Can I apply the trip to all fields in a single map? or do I need >> to apply multiple map functions? >> >> When I try the map ( even with a single ) >> >> scala> val transformed_data = sample_data.map( >> | _.trim(col("field1")) >> | .trim(col("field2")) >> | .trim(col("field3")) >> | .trim(col("field4")) >> | ) >> >> I end up with the following error: >> >> <console>:26: error: value trim is not a member of >> org.apache.spark.sql.Row >> _.trim(col("field1")) >> ^ >> >> Any ideas / guidance would be appreciated! >> >