Go to localhost:4040 While sparksession is running.
Go to localhost:4040 Select Stages from menu option. Select Job you are interested in. You can select additional metrics Including DAG visualisation. On Tue, 7 Apr 2020, 17:14 yeikel valdes, <em...@yeikel.com> wrote: > Thanks for your input Soma , but I am actually looking to understand the > differences and not only on the performance. > > ---- On Sun, 05 Apr 2020 02:21:07 -0400 * somplastic...@gmail.com > <somplastic...@gmail.com> * wrote ---- > > If you want to measure optimisation in terms of time taken , then here is > an idea :) > > > public class MyClass { > public static void main(String args[]) > throws InterruptedException > { > long start = System.currentTimeMillis(); > > // replace with your add column code > // enough data to measure > Thread.sleep(5000); > > long end = System.currentTimeMillis(); > > int timeTaken = 0; > timeTaken = (int) (end - start ); > > System.out.println("Time taken " + timeTaken) ; > } > } > > On Sat, 4 Apr 2020, 19:07 , <em...@yeikel.com> wrote: > > Dear Community, > > > > Recently, I had to solve the following problem “for every entry of a > Dataset[String], concat a constant value” , and to solve it, I used > built-in functions : > > > > val data = Seq("A","b","c").toDS > > > > scala> data.withColumn("valueconcat",concat(col(data.columns.head),lit(" > "),lit("concat"))).select("valueconcat").explain() > > == Physical Plan == > > LocalTableScan [valueconcat#161] > > > > As an alternative , a much simpler version of the program is to use map, > but it adds a serialization step that does not seem to be present for the > version above : > > > > scala> data.map(e=> s"$e concat").explain > > == Physical Plan == > > *(1) SerializeFromObject [staticinvoke(class > org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, > java.lang.String, true], true, false) AS value#92] > > +- *(1) MapElements <function1>, obj#91: java.lang.String > > +- *(1) DeserializeToObject value#12.toString, obj#90: java.lang.String > > +- LocalTableScan [value#12] > > > > Is this over-optimization or is this the right way to go? > > > > As a follow up , is there any better API to get the one and only column > available in a DataSet[String] when using built-in functions? > “col(data.columns.head)” works but it is not ideal. > > > > Thanks! > > >