[
https://issues.apache.org/jira/browse/SPARK-57851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
L. C. Hsieh reassigned SPARK-57851:
-----------------------------------
Assignee: L. C. Hsieh
> Shuffle-free single-task execution for small queries
> ----------------------------------------------------
>
> Key: SPARK-57851
> URL: https://issues.apache.org/jira/browse/SPARK-57851
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: L. C. Hsieh
> Assignee: L. C. Hsieh
> Priority: Major
> Labels: pull-request-available
>
> Part of the SPIP umbrella SPARK-56978 (Faster queries in local laptop mode),
> covering the third category: shuffle-free local execution for small queries.
> This adds a conservative optimizer rule (MarkSingleTaskExecution) that marks
> small single-partition scans, optionally with a shuffle-inducing operator on
> top (sort, aggregate, distinct, window, limit/offset, expand), as candidates
> for single-task execution. Such a scan reports a SinglePartition output
> partitioning, allowing EnsureRequirements to elide the shuffle that would
> otherwise be inserted before the operator on top. The shuffle is not required
> for correctness, so removing it reduces scheduling overhead for small,
> low-latency queries.
> The feature is controlled by a new set of internal configs under
> spark.sql.optimizer.singleTaskExecution.* and is disabled by default.
> This initial change covers the core operator set; join is intentionally left
> out and can be added as a follow-up. Union is already covered by the existing
> spark.sql.unionOutputPartitioning.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]