Hi Makatun, For 2, I guess `cache` will break up the logical plan and force it be analyzed. For 3, I have a similar observation here https://medium.com/@manuzhang/the-hidden-cost-of-spark-withcolumn-8ffea517c015. Each `withColumn` will force the logical plan to be analyzed which is not free. There is `RuleExecutor.dumpTimeSpent` that prints analysis time and turning on DEBUG log will also give you much more info.
Thanks, Manu Zhang On Mon, Aug 20, 2018 at 10:25 PM antonkulaga <[email protected]> wrote: > makatun, did you try to test somewhing more complex, like > dataframe.describe > or PCA? > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected] > >
