Hi I think u need an UDF if u want to transform a column.... Hth On 1 Mar 2017 4:22 pm, "Bill Schwanitz" <bil...@bilsch.org> wrote:
> Hi all, > > I'm fairly new to spark and scala so bear with me. > > I'm working with a dataset containing a set of column / fields. The data > is stored in hdfs as parquet and is sourced from a postgres box so fields > and values are reasonably well formed. We are in the process of trying out > a switch from pentaho and various sql databases to pulling data into hdfs > and applying transforms / new datasets with processing being done in spark > ( and other tools - evaluation ) > > A rough version of the code I'm running so far: > > val sample_data = spark.read.parquet("my_data_input") > > val example_row = spark.sql("select * from parquet.my_data_input where id > = 123").head > > I want to apply a trim operation on a set of fields - lets call them > field1, field2, field3 and field4. > > What is the best way to go about applying those trims and creating a new > dataset? Can I apply the trip to all fields in a single map? or do I need > to apply multiple map functions? > > When I try the map ( even with a single ) > > scala> val transformed_data = sample_data.map( > | _.trim(col("field1")) > | .trim(col("field2")) > | .trim(col("field3")) > | .trim(col("field4")) > | ) > > I end up with the following error: > > <console>:26: error: value trim is not a member of org.apache.spark.sql.Row > _.trim(col("field1")) > ^ > > Any ideas / guidance would be appreciated! >