panbingkun commented on PR #49411:
URL: https://github.com/apache/spark/pull/49411#issuecomment-2579439319

   In the `withFilter` scenario of `SubExprEliminationBenchmark`, the root 
cause as follows:
   ```scala
     val df = spark.read
                 .text(path.getAbsolutePath)
                 .where(predicate)
     df.write.mode("overwrite").format("noop").save()
   ```
   
   - When `from_json` does not implement codegen
   FilterExec.doExecute -> Predicate.create -> 
CodeGeneratorWithInterpretedFallback.createObject -> 
Predicate.createCodeGeneratedObject -> CodegenContext.subexpressionElimination
   
https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L281
   
https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/core/src/main/scala/org/apache/spark/sql/execution/FilterEvaluatorFactory.scala#L39
   
https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CodeGeneratorWithInterpretedFallback.scala#L45
   
https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GeneratePredicate.scala#L41
   
https://github.com/apache/spark/blame/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1270
   ## Ultimately, optimize the 500 calls to `from json` to only 1 call ##
   
   - When `from_json` implement codegen
   FilterExec.doConsume -> GeneratePredicateHelper.generatePredicateCode
   
https://github.com/apache/spark/blob/0123a5ecbe6d4075b0738e9d2faac354f2cbd008/sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala#L252
   
   ## there is no `subexpressionElimination` optimization here, 500 calls will 
ultimately be applied to `JsonToStructs`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to