panbingkun commented on PR #49411: URL: https://github.com/apache/spark/pull/49411#issuecomment-2579439319
In the `withFilter` scenario of `SubExprEliminationBenchmark`, the root cause as follows: ```scala val df = spark.read .text(path.getAbsolutePath) .where(predicate) df.write.mode("overwrite").format("noop").save() ``` - When `from_json` does not implement codegen FilterExec.doExecute -> Predicate.create -> CodeGeneratorWithInterpretedFallback.createObject -> Predicate.createCodeGeneratedObject -> CodegenContext.subexpressionElimination https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L281 https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/core/src/main/scala/org/apache/spark/sql/execution/FilterEvaluatorFactory.scala#L39 https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallback.scala#L45 https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratePredicate.scala#L41 https://github.com/apache/spark/blame/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1270 ## Ultimately, optimize the 500 calls to `from json` to only 1 call ## - When `from_json` implement codegen FilterExec.doConsume -> GeneratePredicateHelper.generatePredicateCode https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L252 ## there is no `subexpressionElimination` optimization here, 500 calls will ultimately be applied to `JsonToStructs`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org