Re: [PR] [SPARK-51544][SQL] Add only unique and necessary metadata columns [spark]

via GitHub Sat, 05 Apr 2025 10:52:11 -0700


cloud-fan commented on code in PR #50304:
URL: https://github.com/apache/spark/pull/50304#discussion_r2002677839



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1031,18 +1036,39 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
 
     private def addMetadataCol(
         plan: LogicalPlan,
-        requiredAttrIds: Set[ExprId]): LogicalPlan = plan match {
+        requiredAttrIds: Set[ExprId],
+        onlyUniqueAndNecessaryMetadataColumns: Boolean = true): LogicalPlan = 
plan match {
       case s: ExposesMetadataColumns if s.metadataOutput.exists( a =>
         requiredAttrIds.contains(a.exprId)) =>
         s.withMetadataColumns()
       case p: Project if p.metadataOutput.exists(a => 
requiredAttrIds.contains(a.exprId)) =>
+        val existingExprIds = new util.HashSet[ExprId]
+        p.projectList.foreach(attr => existingExprIds.add(attr.exprId))

Review Comment:
   I don't quite get this part. If the metadata column is already added to the 
project list, then `requiredAttrIds` will not contain it and we won't add this 
metadata column anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51544][SQL] Add only unique and necessary metadata columns [spark]

Reply via email to