Hi Vibhatha,

I helped you post this question to another community. There is one answer by 
someone else for your reference.

To access the logical plan or optimized plan, you can register a custom 
QueryExecutionListener and retrieve the plans during the query execution 
process. Here's an example of how to do it in Scala:

> import org.apache.spark.sql.{SparkSession, QueryExecutionListener}
>
> // Create a custom QueryExecutionListener
> class CustomQueryExecutionListener extends QueryExecutionListener {
> override def onSuccess(funcName: String, qe: 
> org.apache.spark.sql.execution.QueryExecution, durationNs: Long): Unit = {
> // Retrieve the logical plan
> val logicalPlan = qe.logical
>
> // Retrieve the optimized plan
> val optimizedPlan = qe.optimizedPlan
>
> // Process the plans with your custom function
> processPlans(logicalPlan, optimizedPlan)
> }
>
> override def onFailure(funcName: String, qe: 
> org.apache.spark.sql.execution.QueryExecution, exception: Exception): Unit = 
> {}
> }
>
> // Create a SparkSession
> val spark = SparkSession.builder()
> .appName("Example")
> .getOrCreate()
>
> // Register the custom QueryExecutionListener
> spark.listenerManager.register(new CustomQueryExecutionListener)
>
> // Perform your DataFrame operations
> val df = spark.read.csv("path/to/file.csv")
> val filteredDF = df.filter(df("column") > 10)
> val resultDF = filteredDF.select("column1", "column2")
>
> // Trigger the execution of the DF to invoke the listener
> resultDF.show()

Thank You & Best Regards
Winston Lai
________________________________
From: Vibhatha Abeykoon <vibha...@gmail.com>
Sent: Wednesday, August 2, 2023 5:03:15 PM
To: Ruifeng Zheng <zrfli...@gmail.com>
Cc: Winston Lai <weiruanl...@gmail.com>; user@spark.apache.org 
<user@spark.apache.org>
Subject: Re: Extracting Logical Plan

I understand. I sort of drew the same conclusion. But I wasn’t sure. Thanks 
everyone for taking time on this.

On Wed, Aug 2, 2023 at 2:29 PM Ruifeng Zheng 
<zrfli...@gmail.com<mailto:zrfli...@gmail.com>> wrote:
In Spark Connect, I think the only API to show optimized plan is 
`df.explain("extended")` as Winston mentioned, but it is not a LogicalPlan 
object.

On Wed, Aug 2, 2023 at 4:36 PM Vibhatha Abeykoon 
<vibha...@gmail.com<mailto:vibha...@gmail.com>> wrote:
Hello Ruifeng,

Thank you for these pointers. Would it be different if I use the Spark connect? 
I am not using the regular SparkSession. I am pretty new to these APIs. 
Appreciate your thoughts.

On Wed, Aug 2, 2023 at 2:00 PM Ruifeng Zheng 
<zrfli...@gmail.com<mailto:zrfli...@gmail.com>> wrote:
Hi Vibhatha,
   I think those APIs are still avaiable?



```
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.4.1
      /_/

Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 11.0.19)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val df = spark.range(0, 10)
df: org.apache.spark.sql.Dataset[Long] = [id: bigint]

scala> df.queryExecution
res0: org.apache.spark.sql.execution.QueryExecution =
== Parsed Logical Plan ==
Range (0, 10, step=1, splits=Some(12))

== Analyzed Logical Plan ==
id: bigint
Range (0, 10, step=1, splits=Some(12))

== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12))

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)

scala> df.queryExecution.optimizedPlan
res1: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
Range (0, 10, step=1, splits=Some(12))
```



On Wed, Aug 2, 2023 at 3:58 PM Vibhatha Abeykoon 
<vibha...@gmail.com<mailto:vibha...@gmail.com>> wrote:
Hi Winston,

I need to use the LogicalPlan object and process it with another function I 
have written. In earlier Spark versions we can access that via the dataframe 
object. So if it can be accessed via the UI, is there an API to access the 
object?

On Wed, Aug 2, 2023 at 1:24 PM Winston Lai 
<weiruanl...@gmail.com<mailto:weiruanl...@gmail.com>> wrote:
Hi Vibhatha,

How about reading the logical plan from Spark UI, do you have access to the 
Spark UI? I am not sure what infra you run your Spark jobs on. Usually you 
should be able to view the logical and physical plan under Spark UI in text 
version at least. It is independent from the language (e.g., scala/Python/R) 
that you use to run Spark.


On Wednesday, August 2, 2023, Vibhatha Abeykoon 
<vibha...@gmail.com<mailto:vibha...@gmail.com>> wrote:
Hi Winston,

I am looking for a way to access the LogicalPlan object in Scala. Not sure if 
explain function would serve the purpose.

On Wed, Aug 2, 2023 at 9:14 AM Winston Lai 
<weiruanl...@gmail.com<mailto:weiruanl...@gmail.com>> wrote:
Hi Vibhatha,

Have you tried pyspark.sql.DataFrame.explain — PySpark 3.4.1 documentation 
(apache.org)<https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.explain.html#pyspark.sql.DataFrame.explain>
 before? I am not sure what infra that you have, you can try this first. If it 
doesn't work, you may share more info such as what platform you are running 
your Spark jobs on, what cloud servies you are using ...

On Wednesday, August 2, 2023, Vibhatha Abeykoon 
<vibha...@gmail.com<mailto:vibha...@gmail.com>> wrote:
Hello,

I recently upgraded the Spark version to 3.4.1 and I have encountered a few 
issues. In my previous code, I was able to extract the logical plan using 
`df.queryExecution` (df: DataFrame and in Scala), but it seems like in the 
latest API it is not supported. Is there a way to extract the logical plan or 
optimized plan from a dataframe or dataset in Spark 3.4.1?

Best,
Vibhatha
--
Vibhatha Abeykoon
--
Vibhatha Abeykoon


--
Ruifeng Zheng
E-mail: zrfli...@gmail.com<mailto:zrfli...@gmail.com>
--
Vibhatha Abeykoon


--
Ruifeng Zheng
E-mail: zrfli...@gmail.com<mailto:zrfli...@gmail.com>
--
Vibhatha Abeykoon

Reply via email to