Hi,
I am seeking advice on measuring the performance of each QueryStage (QS) when
AQE is enabled in Spark SQL. Specifically, I need help to automatically map a
QS to its corresponding jobs (or stages) to get the QS runtime metrics.
I recorded the QS structure via a customized injected Query Sta
Hi,
The detailed stage page shows the involved WholeStageCodegen Ids in its DAG
visualization from the Spark UI when running a SparkSQL. (e.g., under the link
node:18088/history/application_1663600377480_62091/stages/stage/?id=1&attempt=0).
However, I have trouble extracting the WholeStageCodeg
some other way to apply stage level scheduling
> to SQL/dataframe, or like mentioned in original issue if AQE gets smart
> enough it would just do it for the user, but lots of factors that come into
> play that make that difficult as well.
>
> Tom
> On Friday, September 30, 202
O, should be based
> on analysis and costing the plan. For this RDD only stage level scheduling
> should be sufficient.
>
> > On Thu, Sep 29, 2022 at 8:56 AM Chenghao Lyu wrote:
> > > Hi,
> > >
> > > I plan to deploy the stage-level scheduling for Spark S
Hi,
I plan to deploy the stage-level scheduling for Spark SQL to apply some
fine-grained optimizations over the DAG of stages. However, I am blocked by the
following issues:
1. The currentĀ stage-level schedulingĀ supports RDD APIs only. So is there a way
to reuse the stage-level scheduling for